How do I select parsers?

b1draper commented 4 months ago

Describe the problem:

When I use log2timeline or psteal to process a Windows 2016 server .vhd file it appears to "skip" parsing the data within the inetpub folder. I've tried to have psteal and log2timeline create a timeline by directing it at the .vhd file and selecting the partition that I'd like to process. I've also tried directing the tools at the mounted drive letter (mounted using Arsenal Image Mounter). Either way it appears to skip or not process the inetpub folder. If I direct the tools to only the inetpub folder everything within the folder get processed and a timeline is generated containing the data. I only know that this data is missing from the whole disk timeline because there's TA IPs within the inetpub timeline that are not within the whole disk timeline. I also don't see any entries in the whole disk timeline that come from anything within the inetpub folder aside from MFT information about the files within the inetpub folder.

Seeing this data "missing" makes me question what else is being skipped and not included in the timelines I am creating.

To Reproduce:

The version of Plaso you used:

20220724

The operating system you are running Plaso on (Not the operating system of the image/files you're trying to analyze):

Ubuntu 18.04.2 LTS (WSL)

Steps to reproduce the behavior including command line and arguments and output:

I've used variations of the following command lines. I am using a Windows 2019 server running WSL Ubuntu

psteal.py --workers 45 --source /mnt/f -w ./ServerName_timelineMountedCPartition.csv
log2timeline.py --workers 45 --source /mnt/f -w ./ServerName_timelineMountedC-L2T.cav
psteal.py --workers 45 --source ./ServernameOSdisk.vhd -w ./ServerName_timelineAllPartitions.csv
psteal.py --workers 45 --source ./mnt/f/inetpub -w ./ServerName_inetpub_timeline.cv (successfully creates a timeline of data within the inetpub folder)

Please provide the source data you used when you experienced the problem. For publicly available data please provide a URL or path of the source data.

For example: individual ChromeOS syslog file

The method you used to install Plaso: I used the process outlined on the website (readthedocs)

If multiple installation methods were used please indicate.

Expected behavior: I would expect for everything to be within the timelines that I am creating

joachimmetz commented 4 months ago

Given that you have mounted the file system is this a permissions issue? What do the logs / extraction warnings tell you?

joachimmetz commented 4 months ago

I see also that you are using version 20220724, this version is no longer supported. Please upgrade to the most recent version

b1draper commented 4 months ago

@joachimmetz - Gotcha, I can update to the current version. I also tried this on another VM with I believe the current version of plaso and noticed the same data appeared to be missing. I don't think that it's a permission issue since I also tried directing the tools at the server .vhd file.

joachimmetz commented 4 months ago

the command you provided hinted you are using a mounted file system

psteal.py --workers 45 --source /mnt/f -w ./ServerName_timelineMountedCPartition.csv

I would need additional factual information also see https://plaso.readthedocs.io/en/latest/sources/Troubleshooting.html

I cannot solve the issue if I cannot reproduce it

b1draper commented 4 months ago

Thanks for the follow up. I'd thought it was permissions too but the following command works perfectly as expected, creating a timeline of data within that folder.

psteal.py --workers 45 --source ./mnt/f/inetpub -w ./ServerName_inetpub_timeline.cv (successfully creates a timeline of data within the inetpub folder)

This command references the vhd file directly, which I would think bypasses the possibility of a file permission issue

psteal.py --workers 45 --source ./ServernameOSdisk.vhd -w ./ServerName_timelineAllPartitions.csv

joachimmetz commented 4 months ago

Passing the ServernameOSdisk.vhd as source should have no permission issues with the files in the VHD.

b1draper commented 4 months ago

That's what I thought too. However when I have it use ServerOSdisk.vhd as the source the inetpub folder doesn't get processed. The same thing happens when I mount the disk using ArsenalImage Mounter and launch WSL to use log2timeline or psteal to process the whole disk. But if I tell it to only do the inet pub folder "/mnt/f/inetpub" the files get parsed correctly.

By default, are psteal and log2timeline designed to skip the inetpub folder when processing the full disk? Is there a way to make sure it's processed?

joachimmetz commented 4 months ago

the inetpub folder doesn't get processed.

what do you specifically mean with "doesn't get processed"? aren't there a filestat events for this folder?

are the right parsers enabled for files in the inetpub folder?

b1draper commented 4 months ago

So, if I run the command psteal.py --workers 45 --source ./ServernameOSdisk.vhd -w ./ServerName_timelineAllPartitions.csv I get a timeline that doesn't include any of the events contained within the logs from the inetpub folder. I do have metadata and FileStat info for the files within the folder like I have below.

If I use the command psteal.py --workers 45 --source ./mnt/f/inetpub -w ./ServerName_inetpub_timeline.csv the events within the .log files are processed creating a timeline like I have below. From here I can see that there are events listed in this file that aren't included with in the other one. I would expect the info below to be included within the timeline created above (below is an example of sanitized data) ![image](https://github.com/log2timeline/plaso/assets/44442120/7a8541c7-564e-495f-a9f5-59ff2eaaaba8)

joachimmetz commented 4 months ago

I get a timeline that doesn't include any of the events contained within the logs from the inetpub folder.

ah, that is something different than "inetpub folder doesn't get processed"

The parser needed for the log files likely does not get enabled in storage image media source mode but does gets enabled in directory source mode. What does pinfo tell you about the parsers that are enabled? what if you add that parser to the preset (either via the config or the parsers command line option)?

Maybe useful reference https://plaso.readthedocs.io/en/latest/sources/user/Log2Timeline-Perl-%28Legacy%29.html

jleaniz commented 4 months ago

Drive-by comment: Presumably you have IIS logs in the Inetpub folder that you want processed. When you pass the VHD file as the source file to process, Plaso will try to detect the OS for the image file and apply a default parser present, since you did not specify any.

I believe the default preset for Windows images https://github.com/log2timeline/plaso/blob/main/plaso/data/presets.yaml#L160 does not include the IIS parser. You will have to explicitly include it as an argument (e.g. --parsers win7,text/winiis). This is handled differently if you give Plaso a path (directory) instead of an image file.

See log2timeline.py --parsers list for a full list of parsers.

This is my assumption from reading your description of the problem, I could be mistaken.

joachimmetz commented 4 months ago

Given the frequency of responses yesterday and none today, I assume this question has been answered. Closing, reopen if needed.

b1draper commented 4 months ago

Thanks for the insight I was unaware of the winiis parser. I was under the impression that by not specifying a parser it would try everything or parse everything. The "basic-usage" paragraph on the linked page gave me that understanding https://plaso.readthedocs.io/en/latest/sources/user/Using-log2timeline.html. This method is also covered in the SANS 508 class as an "all-inclusive" approach.
When I tried using the "winiis" parser I got an error. I edited the /usr/share/plaso/presets.yaml file modifying the list of parsers included with the win7_slow to have the winiis in the list like the others.
I invoke the command using the following syntax log2timeline.py --workers 45 --parsers "win7_slow" --storage-fime ./Servername_parsers.plaso ./ServernameOSdisk.vhd . That command gives me an error "Unknown Parser or Plugin names in element(s): "winiis". After which processing is aborted. Is this an add-on module?

joachimmetz commented 4 months ago

This method is also covered in the SANS 508 class as an "all-inclusive" approach.

We don't control the SANS 508 material and we have provided feedback about this multiple times before but no response from SANS. Tl;dr they assume too much and do not necessarily validate their material or keep it up to date.

Winiis is a text parser plugin, try --parsers text/winiis

b1draper commented 4 months ago

Thanks for the feedback. I'll give your suggestion a try. The kitchen sink reference is also here. https://plaso.readthedocs.io/en/latest/sources/user/Using-log2timeline.html

joachimmetz commented 4 months ago

kitchen sink means "almost anything one can think of" (not necessarily everything) it was a term used by the original author I'll change the documentation to make it more clear that the the preset is used by default.

joachimmetz commented 4 months ago

@b1draper PTAL and see if the changes in https://github.com/log2timeline/plaso/pull/4819 make it more clear

b1draper commented 4 months ago

That reads better, but I think what would make it clear is that part that you told me about how log2timeline selects what parsers to use if nothing is defined. If it a Windows volume then it would default to the win7 parser set as defined in the presets file. When you mentioned that to me it helped clear up a bit of confusion. Also when I listed the parsers it lists as winiis but is called text/winiis.

Thanks again for all your help

joachimmetz commented 4 months ago

makes sense, I'll see what I can add

b1draper commented 4 months ago

Below is my suggestion:

**Basic usage:**
The simplest way, and perhaps the most common way to run log2timeline.py is without specifying any additional parameters, only defining the output and input. The output is the path and filename of the storage file while the input is the location of the source, whether that is a single file, a storage media image (disk image) or a directory such as a mount point. Launched in this method log2timeline.py will attempt to identify the type of data being parsed and auto-select the best parser or parser group. Using the selected parser log2timeline will go through the entire data set, identifying and parsing artifacts that match the selected parser(s). The resulting plaso storage file will contain of the extracted information from supported artifact types that match the parser type that was used.

For example, if the source disk image is auto-identified to contain a Windows OS then the win7 parser group will be auto-selected and log2timeline will be run with that parser group. This is same as if the --parsers "win7" had been specified when log2timeline.py was launched **log2timeline.py --parsers "win7" --storage-file output input**. It is suggested that the list of parsers be reviewed to ensure that all desired artifact types are included. The parser definitions can be modified by editing the presets.yaml file.

log2timeline / plaso

How do I select parsers? #4813