target / huntlib

A Python library to help with some common threat hunting data analysis operations
MIT License
138 stars 22 forks source link

Multiprocessing support for SplunkDF #12

Closed DavidJBianco closed 4 years ago

DavidJBianco commented 4 years ago

Because the Splunk API is so incredibly slow, this branch ditches it's oneshot() function in favor of the lower-level Splunk Jobs API. Since we had to write our own results retrieval code, we used Python's built-in multiprocessing module to retrieve results in parallel. The default is now to retrieve results with a single worker, which decreased total search time by about 45% while retrieving 1mil rows in testing.

This PR also includes some code cleanups to improve the flow between SplunkDF.search_df() and SplunkDF.search() as well as adding both uniprocessing and multiprocessing versions of the same unit tests.