Closed davidrudduck closed 3 years ago
To explain the workflow, when we are doing windows forensics, we will often start by trying to find the login events (typically by filtering down to windows event logs and the event id's relevant to logon events).
Once we've identified the accounts that we believe have been compromised, and the logon/logoff times of those sessions, we will then filter down on the artefacts that come from that users folder
so if "john.domain" is the user that we believe was compromised, we will run a query in elasticsearch to show all artifacts below C:\Users\john.domain for the period of the session.
Previously we used the filename field to do this by just using the query "filename: john.domain" (or sometimes "filename: Users AND filename: john*".
We are unable to use this type of query on path_spec due to the JSON formatting of values stored within it. Lucene and KQL returns zero results.
Currently "path_sec" gets populated with JSON in ElasticSearch like such:
{"__type__": "PathSpec", "location": "/mnt/e/Cases/CaseName/Plaso/backupserver.body", "type_indicator": "OS"}
If I were building an ingestion pipeline for ElasticSearch, I would look at this JSON and break it down into it's individual elements.
Therefore the "type" field (dropping all non alphanumeric characters) would become "path_spec.type", with the value "PathSpec" (seems redundant mind you since that seems to be the value for most??); "location" would push into "path_spec.location" in elastic, and "type_indicator" would become "path_spec.type_indicator".
As a general rule, any field that has nested JSON in it should be broken out in a similar way where field becomes "field [.] subfield_name"
@davidrudduck thanks for the write up. Per chat I'll have a look at this as part of #2940, since for ES users there seems to be a need to be able to control output fields.
One easy to implement option could be to add the display_name field to the Elasticsearch output.
A next step would be to change the Elasticsearch output module to support to set field names like dynamic. Need to define the "magical" (pre-defined) field name some where and option like '*' for all container attribute names.
Would it suffice to say that, the field names just become a dotted version of the parent field that otherwise stored it?
So even if you are breaking out an XML based field, if the original field is "xml_string", the dynamic fields resulting from the XML are "xml_string.userid" and "xml_string.username" etc?
In the case of path_spec the same would apply and just get a dotted version of the original field, so the JSON remains in path_spec, but it's JSON entry is broken out into sub fields path_spec.type, etc.?
So even if you are breaking out an XML based field, if the original field is "xml_string", the dynamic fields resulting from the XML are "xml_string.userid" and "xml_string.username" etc?
Yes, additionally generated fields would need to have a name space to prevent collisions with existing event data field names. I was thinking evtx
or equiv.
In the case of path_spec the same would apply and just get a dotted version of the original field, so the JSON remains in path_spec, but it's JSON entry is broken out into sub fields path_spec.type, etc.?
Not sure yet, I think for your use case display_name could suffice. So display_name is a field that exists in other output formats. It's the "path" you see in the output of log2timeline.py/psteal.py. It's a combination of the file system path and the the parent path specification, e.g. VSS1:C\:Windows
Otherwise an fs
name space could help, eg. fs:path
or event_source:path
(context: https://plaso.readthedocs.io/en/latest/sources/user/Scribbles-about-events.html)
Changes add option to allow user to select additional fields in ES output https://github.com/log2timeline/plaso/pull/3463
Closing this issue, recent changes should cover basic needs and #2940 to further extend field formatting.
Description of request: Prior to 20200717, if we were investigating an incident and wanting to deep dive on a particular users artifacts/events, we would use an elasticsearch query like "filename: username" or "filename: username*" to filter down to a particular folder path like C:\Users\username" or "C:\Users\username.domain"
As this field is no longer being populated this is not possible - and the path_spec field is being populated via a JSON value instead of being broken out into sub fields, which is not as easily searchable.
If the data in path_spec was broken out into sub fields (path_spec.type, path_spec.location, etc) instead of being lumped in one field as JSON it would re-enable the ability to filter down based on the path of the artefact(s) we want to show events on.
Plaso version:
For example 20200717
Operating system Plaso is running on:
Ubuntu 18.04 on WSLv1
Installation method: via PPA
Thoughts / suggestions
The alternative workaround for this would be requiring ElasticSearch pipelines to be built to break up the JSON values in the pathspec field; or alternatively just break pathspec out into pathspec.type, pathspec.location and pathspec.indicator rather than dropping the entire JSON into a single value which then needs to be processed to work properly