log2timeline / plaso

Super timeline all the things
https://plaso.readthedocs.io
Apache License 2.0
1.69k stars 333 forks source link

Performance: export after nsrlsvr analysis very slow #1047

Open joachimmetz opened 7 years ago

joachimmetz commented 7 years ago

While running:

psort.py --analysis nsrlsvr --nsrlsvr-hash md5 --nsrlsvr-host 127.0.0.1 --nsrlsvr-port 9120 -w output.log --disable-zeromq test.plaso 

The export after the analysis is very slow, a couple of events per update cycle.

To do:

pettai commented 7 years ago

+1 I've noticed this too. Same if you run a second psort run with another analysis plugin, eg. tagging

kinky-it commented 7 years ago

The slow export speed after running nsrlsvr analysis is caused by the fact that (in plaso 1.5.1), the zip file with tags is opened for each retrieval of a tag (zip_file.py#L1826). For dump files with large sets of tags (typically, after running the nsrlsvr analysis plugin), this causes the zip file to be opened thousands of times.

Even if the zip file with tags would be kept open, the lack of seek support in the zip stream would cause similar troubles. It would cause the zip to be reopened every time that the tag offset in the index is non-sequential. Reading the zip file just once and keeping it open as a memfile fixes the problems, and speeds up the export tremendously.

Not sure, and have not checked thoroughly, but it seems that this is fixed in master already.

Onager commented 5 years ago

Not making September release, removing milestone.