quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
8k stars 327 forks source link

No `num_docs` optimization and lazy top K. #1693

Open fulmicoton opened 2 years ago

fulmicoton commented 2 years ago

If num_docs is not required, there are many optimization we can run.

For instance, if we sort by docs we can often searhc on only one split and abort search within the split.\ If we search by -date too, we can sort splits by order of their max_date, and stop search as soon as we get a guarantee that no docs will enter the top K.

We can hardcode these optimization for the moment, and revisit this if someone has some great formalism to get a proper distributed execution plan abstraction.

trinity-1686a commented 7 months ago

we've added some optimizations to search on less splits (but generally more than one) when num_docs isn't asked for. we also added said optimization when sorting by date/-date/doc_id