No count optimizations - Githubissues

fulmicoton commented 3 months ago

See if we have optimizations for the case where count is not requested.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-your-data.html#track-total-hits

track_total_hits: false

on elasticsearch

PSeitz commented 3 months ago

Yes, we don't scan all splits in some cases:

    // if client wants full count, or we are doing an aggregation, we want to run every splits.
    // However if the aggregation is the tracing aggregation, we don't actually need all splits.
    let run_all_splits = request.count_hits() == CountHits::CountAll
        || (request.aggregation_request.is_some()
            && !matches!(split_filter, CanSplitDoBetter::FindTraceIdsAggregation(_)));

This does happen not cross index currently, so N indexes each with one split won't benefit from this currently.

Otherwise I think there's some information optimization missing with the enum. We may want to carry the number up to which we underestimate. So we can identify cases where we can remove searches completely. Currently we may count although we already reached the threshold.

pub enum CountHits {
    /// Count all hits, querying all splits.
    CountAll = 0,
    /// Give an underestimate of the number of hits, possibly skipping entire
    /// splits if they are otherwise not needed to fulfull a query.
    Underestimate = 1,
}

fulmicoton commented 3 months ago

This ticket is about optimizing was when no count at all is requested.

quickwit-oss / quickwit

No count optimizations #5063