omerbenamram / evtx

A Fast (and safe) parser for the Windows XML Event Log (EVTX) format
Apache License 2.0
625 stars 61 forks source link

[Question] Alter JSON output #208

Open OlafHaalstra opened 2 years ago

OlafHaalstra commented 2 years ago

Dear Omer,

Awesome work on this library, it is really blazing fast.

I hope you can help me with the following question about the JSON serializer. I would like to alter the JSON data that is outputted by the parser and I am looking for the best way to do it.

By default it outputs something like this:

{
        "Event": {
            "EventData": {
                "Binary": null,
        ...
        "Event_attributes": {
            "xmlns": "http://schemas.microsoft.com/win/2004/08/events/event"
        }
}

Which I would like to append a few properties to, e.g.:

{
        "Event": {
            "EventData": {
                "Binary": null,
        ...
        "Event_attributes": {
            "xmlns": "http://schemas.microsoft.com/win/2004/08/events/event"
        },
    "fields": {
        "host": "WIN-TEST",
        "source": "Setup.evtx",
        "time": 1623066248.0
    }
}

This should happen somewhere around this snippet of code, which returns a record which contains the data object which is already a string (from the into_json function):

            EvtxOutputFormat::JSON => {
                for record in parser.records_json() {
                    self.dump_record(record)?;   
                }
            }

The following solutions were the ones I could think off:

  1. Alter the string to insert the fields part.
    • Advantages
    • Easy to implement
    • Fast?
    • Disadvantages
    • Not flexible
    • Error prone
  2. Parse the record.data string to object with serde_json, alter it, and convert it to string again.
    • Advantages
    • Easy to implement
    • Flexible
    • Not prone to errors
    • Disadvantages
    • Compromises performance due to inherent inefficiency
      1. Implement own records_json function
    • Advantages
    • Fast?
    • Flexible
    • Disadvantages
    • I'm a terrible rust developer
    • Introduces a lot of code from your library which will be outdated
  3. insert even better solution here

I'm asking for your advise on this because I wasn't able to figure it out how to properly do it in rust, also performance is important for me so I want to find a very efficient solution.

For solution (3) I already tried to implement something but that doesn't work. Maybe you can provide some guidance or you might even have a much better solution in mind.

// Stable shim until https://github.com/rust-lang/rust/issues/59359 is merged.
// Taken from proposed std code.
pub trait ReadSeek: Read + Seek {
    fn tell(&mut self) -> io::Result<u64> {
        self.seek(SeekFrom::Current(0))
    }
    fn stream_len(&mut self) -> io::Result<u64> {
        let old_pos = self.tell()?;
        let len = self.seek(SeekFrom::End(0))?;

        // Avoid seeking a third time when we were already at the end of the
        // stream. The branch is usually way cheaper than a seek operation.
        if old_pos != len {
            self.seek(SeekFrom::Start(old_pos))?;
        }

        Ok(len)
    }
}

impl<T: Read + Seek> ReadSeek for T {}

pub struct JsonSerialize<'a, T: ReadSeek> {
    settings: ParserSettings,
    parser: &'a mut EvtxParser<T>,
}

impl<T: ReadSeek> JsonSerialize<'_, T> {

    /// Return an iterator over all the records.
    /// Records will be JSON-formatted.
    pub fn records_json(
        &mut self,
    ) -> impl Iterator<Item = Result<SerializedEvtxRecord<String>, EvtxError>> + '_ {
        EvtxParser::serialized_records(self.parser, |record| record.and_then(|record| self.into_json(record)))
    }

    /// Consumes the record and parse it, producing a JSON serialized record.
    fn into_json(self, record: EvtxRecord) -> Result<SerializedEvtxRecord<String>, EvtxError> {
        let indent = self.settings.should_indent();
        let mut record_with_json_value = EvtxRecord::into_json_value(record)?;

        let data = if indent {
            serde_json::to_string_pretty(&record_with_json_value.data)
                .map_err(SerializationError::from)?
        } else {
            serde_json::to_string(&record_with_json_value.data).map_err(SerializationError::from)?
        };

        Ok(SerializedEvtxRecord {
            event_record_id: record_with_json_value.event_record_id,
            timestamp: record_with_json_value.timestamp,
            data,
        })
    }
}
omerbenamram commented 2 years ago

I would probably use jq for this. https://stackoverflow.com/questions/49632521/how-to-add-a-field-to-a-json-object-with-the-jq-command.

It can also handle streams if that's an issue https://stackoverflow.com/questions/62825963/improving-performance-when-using-jq-to-process-large-files.

OlafHaalstra commented 2 years ago

Preferably I want to have it baked into the code. Not sure where to start. Running into problems with option (2): apparently renaming fields is not trivial.

Replacing values is quite easy with: *v.get_mut("name").unwrap() = json!("Alice"); As well as adding something:

    let new_data = r#"{"name":"Alice"}"#;
    let new_value: JSONValue = serde_json::from_str(new_data)?;
    v["new"] = new_value;
forensicmatt commented 2 years ago

@OlafHaalstra , you will want to create a custom tool around the evtx library and do something like this:

    let mut evtx_parser = match EvtxParser::from_path(path) {
        Ok(p) => p.with_configuration(parser_settings),
        Err(e) => {
            eprintln!("Error handling {}; {}", path.display(), e);
            return;
        }
    };

    for result in evtx_parser.records_json_value() {
        let record = match result {
            Ok(r) => r,
            Err(e) => {
                eprintln!("Error serializing event record: {}", e);
                continue;
            }
        };

        let mut json_value = record.data;
        json_value["source_file"] = json!(path.to_string_lossy());

        println!("{}", json_value);
    }

I am actually planning to make a YouTube video this week that will showcase just this along with things like recursing and parsing files in parallel. Subscribe and hit the bell so it will alert you when this video comes out (https://www.youtube.com/channel/UCudIWnSPimNaqMyGoKbaneQ)

forensicmatt commented 2 years ago

Preferably I want to have it baked into the code. Not sure where to start. Running into problems with option (2): apparently renaming fields is not trivial.

Replacing values is quite easy with: *v.get_mut("name").unwrap() = json!("Alice"); As well as adding something:

    let new_data = r#"{"name":"Alice"}"#;
    let new_value: JSONValue = serde_json::from_str(new_data)?;
    v["new"] = new_value;

Baking this into the library is not a good idea. Its better to augment data after you have parsed the raw data as this is personal preference on how you want to structure metadata around the parsed entry.

forensicmatt commented 2 years ago

@OlafHaalstra I made a video that I think will answer your question on how to do this and also give you an example of how to create a CLI around this library and tweak the json values. https://www.youtube.com/watch?v=yVeCAMQ5fZo