opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.01k stars 1.67k forks source link

Support sub aggregation in filter rewrite optimization #12602

Open bowenlan-amzn opened 3 months ago

bowenlan-amzn commented 3 months ago

Follow up task of #9310

Currently sub aggregation is not supported in filter rewrite optimization, only single date histogram is supported. This makes the applicable scenarios very limited. It would be great we can find a way to support sub aggregation while applying the filter rewrite optimization.

I notice one possible path when applying the optimization to composite aggregation previously. There's a established pattern to defer the sub aggregation collection. The idea is to do the aggregation collection in 2 pass. 1st pass is to get the docIdSets per bucket, 2nd pass is to run the collection of the sub aggregation on these docIdSets per bucket.

https://github.com/opensearch-project/OpenSearch/blob/246557c4b5ee71187f2dc98ebfa93409a187037e/server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java#L648-L673

Theoretically, the performance improvement still comes from using index structure instead of iteration to get the matching docs to collect at the date histogram level. Sub aggregation collection on these matching docs is expected to be at same speed. And there would be some memory cost of saving the docIdSets for a certain period for 2nd pass.

In the end, we are expected performance improvement on these 2 operations from big5 workload. These operations have sub-aggregation.

Some other issues will also improve the performance of sub-aggregation, and they are coming from indexing side — compute some special index structure to improve the sub-aggregation performance, whereas this approach is focused on the query-time improvement.

3734

12498

finnegancarroll commented 2 weeks ago

Picking up this issue.