google / timesketch

Collaborative forensic timeline analysis
Apache License 2.0
2.62k stars 589 forks source link

Sketch label aggregation scaling issue #3191

Closed jkppr closed 1 month ago

jkppr commented 1 month ago

Describe the bug For a Timesketch deployment with many sketches there seems to be an issue with a datastore.get_filter_label() function crashing with an urllib3.exceptions.ProtocolError: ('Connection aborted.', HTTPException('got more than 100 headers')) error.

This happens because the Sketch API endpoint tries to get filter labels even for sketches that have no index (e.g. empty sketches). However, opensearchpy handles an empty value for index as "search all indices". This is not expected but is also not a big issue. Until the amount of tasks created to run the aggregation across all indices exceeds 100 and therefore running into the max number of headers urllib3 can handle.

Affected code: https://github.com/google/timesketch/blob/master/timesketch/api/v1/resources/sketch.py#L472

To fix this, we should not request the filter labels for sketches without and index.