quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
7.79k stars 312 forks source link

search nodes unresponsive with expensive aggregation #5305

Open PSeitz opened 1 month ago

PSeitz commented 1 month ago

As reported:

If I do an aggregation over a large time span, the searchers will end up being killed by kubernetes for missing health checks. I'd assume because the CPU usage is too high and they are mostly un-responsive

PSeitz commented 4 weeks ago

It seems there are two issues

  1. Judging from the logs, the aggregation request is sent 50 times.
  2. The search thread pool takes all CPUs. This may not leave enough resources to answer health checks.

https://github.com/quickwit-oss/quickwit/pull/5304 takes all threads except one for the search thread pool