omerbenamram / evtx

A Fast (and safe) parser for the Windows XML Event Log (EVTX) format
Apache License 2.0
625 stars 61 forks source link

Feature/filtered_output #191

Closed janstarke closed 3 years ago

janstarke commented 3 years ago

filtering evtx records based on event id or event data (using regular expressions). We use this regularly in our cases and have written a custom tool for it (https://github.com/janstarke/evtxgrep), but your code base provides much more features, so I tried to integrate the features we need into your tool.

forensicmatt commented 3 years ago

Just curious, why implement this in the library level? Why not just implement this at the interface level? I get that doing at the iterator level might give you a little bit more performance, but, I would doubt that it would be that noticeable. There are many ways to filter things and by doing it at this level it adds extra overhead and unnecessary internal tweaks for someone who opts to filter a different way. For example, on record collection, you could filter though JMESpath queries for additional filtering methods instead of just Regex and IDs.

janstarke commented 3 years ago

I first started to implement this outside of the library (https://github.com/janstarke/evtxgrep), but I found that some symbols I needed for better performance were private :-( Especially, I didn't want to convert to XML/JSON and than parse the record back into a data structure, before possibly filtering out that record...

ohadravid commented 3 years ago

Hi @janstarke :) Thanks for taking the time to open a PR and explain your usecase. I think @forensicmatt is right here, this expands the scope (and complexity) of the crate a bit too much. A better approach might be to expose the structures your tool needs, so you (and others) could build whatever custom logic you want on top of this crate.
I think serialized_records might be already good enough for most of what you want, and because the tokens field is already public, you could implement a visitor/walk the "tree" directly on them without doing any text generation (which should be a lot faster, as you won't even need to clone the tokens or generate text).

If that doesn't work you could expose a parse_tokens for Records and the matching trait, but I don't think that would be needed.

janstarke commented 3 years ago

I thought I already tried that, but again I will take a look at it. One other reason why I implemented i that way was because I wanted to benefit from the output options that evtx_dump already provides. Maybe we can find a way to filter AND use this functionality.