yampelo / beagle

Beagle is an incident response and digital forensics tool which transforms security logs and data into graphs.
MIT License
1.27k stars 145 forks source link

Speed up EVTX Parsing #42

Open yampelo opened 5 years ago

yampelo commented 5 years ago

Move over to https://github.com/omerbenamram/pyevtx-rs

omerbenamram commented 5 years ago

@yampelo let me know if you need a hand with this :)

yampelo commented 5 years ago

@omerbenamram It's mainly a question of do i change the output of your tool to match what i was working off of before, or do i change all the functions to match the output of your tool. For example:

proc = SysMonProc(
            host=event["Computer"],
            user=event["EventData_User"],
            process_guid=event["EventData_ProcessGuid"],
            process_id=int(event["EventData_ProcessId"]),
            process_image=process_image,
            process_image_path=process_path,
        )
        proc_file = proc.get_file_node()
        proc_file.file_of[proc]

        dest_addr = IPAddress(ip_address=event["EventData_DestinationIp"])

        proc.connected_to[dest_addr].append(
            timestamp=event["EventData_UtcTime"],
            port=event["EventData_DestinationPort"],
            protocol=event["EventData_Protocol"],
        )

        if event.get("EventData_DestinationHostname"):
            hostname = Domain(event["EventData_DestinationHostname"])
            hostname.resolves_to[dest_addr].append(timestamp=event["EventData_UtcTime"])
            return (proc, proc_file, dest_addr, hostname)

        return (proc, proc_file, dest_addr)

Works off of this: https://github.com/yampelo/beagle/blob/master/beagle/datasources/win_evtx.py#L58

omerbenamram commented 5 years ago

@yampelo The nice thing is that my package already produces valid JSON in rust, so most of the code that is currently here https://github.com/yampelo/beagle/blob/master/beagle/datasources/win_evtx.py#L78 will go away (replaced with json.loads).

As for these snippets - to be compatible with my output, it's merely changing event["EventData_UtcTime"] to event["EventData"]["UtcTime"] (which is the way they are actually represented in the event), but you could also adapt the json output to be flat to match the current code, I think the former option is slightly nicer but both should do the trick.

You could use a snippet that flattens the data (eg https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys) to basically make this drop in.

So it's really up to you :) But if I could help in any ways id be willing to see this go through, you'd be very surprised with the performance difference if you haven't tried this already.