opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.84k stars 1.83k forks source link

[Star Tree] [Search] Date histogram with metric aggregation #16552

Open sandeshkr419 opened 2 weeks ago

sandeshkr419 commented 2 weeks ago

Is your feature request related to a problem? Please describe

This is to support date histogram aggregations with metric aggregations.

Example query shape:

{
    "size": 0,
    "aggs": {
        "by_hour": {
            "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "hour"
            }, "aggs": {
                "sum_status": {
                    "sum": {
                        "field": "status"
                    }
                }
            }
        }
    }
}

Describe the solution you'd like

Vanilla date histograms aggregations with no metric aggregations are already optimized and would most likely not benefit from star-tree optimization. Its specifically where metric aggregations (sum, min, max, avg) are required, those cases would gain performance benefits.

Related component

Search:Aggregations

Describe alternatives you've considered

No response

Additional context

No response

sandeshkr419 commented 3 days ago

@msfroh - did some brainstorming on how to resolve nested aggregations effectively with @bharath-techie and came up with this design draft PR where I use a wrapper over LeafBucketCollector to resolve nested aggregations.

Considered other approaches where I tried to resolve it without a wrapper, but figuring out how to assign buckets of sub-aggregations seemed tricky.

Please let me know your initial thoughts on the draft PR (please ignore the hard-coding for now). Need comments specifically on assigning values to buckets of sub-aggregators.