google / timesketch

Collaborative forensic timeline analysis
Apache License 2.0
2.59k stars 590 forks source link

[Idea] Import arbitrary data into Timesketch #967

Closed jaegeral closed 4 years ago

jaegeral commented 5 years ago

Maybe it would be a cool thing to add the ability to import pcaps files directly.

This is the outcome of a brainstorming session I had with some friends, not sure how useful it is and I will try to do some research how that could be accomplished.

jaegeral commented 5 years ago

https://www.elastic.co/blog/analyzing-network-packets-with-wireshark-elasticsearch-and-kibana

kiddinn commented 5 years ago

this would also be parsing the data, so is this not better to be done in plaso or some other tool and then exported directly into timesketch?

jaegeral commented 5 years ago

hm that is actually a great idea.

It might lead to a more generic question, if we want the ability to directly upload data to timesketch or always go plaso --> timesketch.

joachimmetz commented 5 years ago

This is the outcome of a brainstorming session I had with some friends, not sure how useful it is and I will try to do some research how that could be accomplished.

Plaso had a pcap parser before, it was not maintained and suffered from high memory usage, hence we removed it.

joachimmetz commented 5 years ago

The question is what information do you want to extract from the PCAP? And will this be the same every time?

kiddinn commented 4 years ago

and also whether you are intending this to be a "simple parser", that just does one-line per network packet, or if you are doing stream assembly and just doing a single line per "session", or per TCP stream (in the case of TCP)... and then parsing the content to add to the packet.

There is also the option of using turbinia and/or dftimewolf that can run some parser and then automatically upload the data to timesketch

jaegeral commented 4 years ago

yeah first idea was packet per packet, of course you can take every idea to the next level, but as you stated, other options exist / would make more sense, then I would cancel the Issue or rephrase it, if it makes sense to have a documentation section with "how to get data xyz into timesketch" where xyz is non native import supported by timesketch.

Thoughts?

kiddinn commented 4 years ago

I'm wondering whether it makes sense to do something simple as:

Regarding the second point, it is very easy to add a function that takes a data frame and uploads that to TS. We could write some documentation and a demo notebook to demonstrate how to use that helper function, or how to get your data into a data frame... since I'm not sure how familiar peopler are with that data structure. There are plenty of "native" methods in dataframes, reading from SQL databases, reading JSON data, Excel, etc, etc... and then other simple manual methods as well.

WDYT about that? We could even demonstrate how to easily parse network packet data and convert that into a DataFrame as an example of how to do this.

kiddinn commented 4 years ago

What this will do is that instead of implementing a parser in timesketch, which I don't really want to do, since timesketch is not about parsing, we simply add a better importer of data, to make it easier to import data... and then you can rely on all the other parsers out there and write a small, simple script to utilize that parser and dump data into TS

kiddinn commented 4 years ago

See #1004 for at least the initial version of the importer...

This could be used like so:

my_sketch.upload_data_frame(data, 'pcap_test', '{src_ip:s}:{src_port:d}->{dst_ip:s}:{dst_port:d} = {url:s}')

For a data frame, but if you don't have that you can do something like:

...
from scapy import all as scapy_all
...

packets = scapy_all.rdpcap(fh)

with client.UploadStreamer() as streamer:
    streamer.set_sketch(my_sketch)
    streamer.set_timestamp_description('Network Log')
    streamer.set_timeline_name('pcap_test_log')
    streamer.set_message_format_string(
        '{src_ip:s}:{src_port:d}->{dst_ip:s}:{dst_port:d} = {url:s}')

    for packet in packets:
        # do something here
        ...
        timestamp = datetime.datetime.utcfromtimestamp(packet.time)
        for k, v in iter(data.fields.items()):
            for url in URL_RE.findall(str(v)):
                url = url.strip()
                streamer.add_dict({
                    'time': timestamp,
                    'src_ip': packet.getlayer('IP').src,
                    'dst_ip': packet.getlayer('IP').dst,
                    'src_port': layer.sport,
                    'dst_port': layer.dport,
                    'url': url})

And this will add the PCAP file content into Timesketch

kiddinn commented 4 years ago

@deralexxx what do you think about this approach?

jaegeral commented 4 years ago

Wow, this is both general enough to catch a lot of different cases and simple enough to serve Startes, I like it a lot and this would also reduce the need to introduce importers for $format.

The other thing is during the issue that spending some more love in documentation would also facilitate awareness that e.g. plaso should be the go to tool if you look for a specific importer and you are not eager to develop something (a thing that I was missing as plaso was not part of my workflow so far). Like you do not re-implement functions in awk that are already in grep if you can simply pipe them together if that makes sense. And I am happy to think about ways to do that, e.g. writing some sentences to the "import data" portion of timesketch documentation.

kiddinn commented 4 years ago

yes, documentation is lacking ;)

and yes, we would love someone to fix that for us ;)

I will add some additional documentation alongside #1004 to document the upload streamer, at least some basic documentation there. But yes, we need more documentation for sure.

jaegeral commented 4 years ago

yes, documentation is lacking ;)

and yes, we would love someone to fix that for us ;)

on it

kiddinn commented 4 years ago

this is now submitted in, and already ready... see documentation here: https://github.com/google/timesketch/blob/master/docs/UploadDataViaAPI.md