quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
7.86k stars 318 forks source link

No count optimizations #5063

Open fulmicoton opened 3 months ago

fulmicoton commented 3 months ago

See if we have optimizations for the case where count is not requested.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-your-data.html#track-total-hits

track_total_hits: false

on elasticsearch

PSeitz commented 3 months ago

Yes, we don't scan all splits in some cases:

    // if client wants full count, or we are doing an aggregation, we want to run every splits.
    // However if the aggregation is the tracing aggregation, we don't actually need all splits.
    let run_all_splits = request.count_hits() == CountHits::CountAll
        || (request.aggregation_request.is_some()
            && !matches!(split_filter, CanSplitDoBetter::FindTraceIdsAggregation(_)));

This does happen not cross index currently, so N indexes each with one split won't benefit from this currently.

Otherwise I think there's some information optimization missing with the enum. We may want to carry the number up to which we underestimate. So we can identify cases where we can remove searches completely. Currently we may count although we already reached the threshold.

pub enum CountHits {
    /// Count all hits, querying all splits.
    CountAll = 0,
    /// Give an underestimate of the number of hits, possibly skipping entire
    /// splits if they are otherwise not needed to fulfull a query.
    Underestimate = 1,
}
fulmicoton commented 3 months ago

This ticket is about optimizing was when no count at all is requested.