opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.7k stars 1.8k forks source link

[BUG] Search with nested field aggregations fails to complete and causes high CPU #16496

Open jameshinstac opened 3 hours ago

jameshinstac commented 3 hours ago

Describe the bug

Search queries against an index with nested field aggregations fail to complete, even if the index is empty.

CPU use rises to 50% on the first request and increases to 100% if another request is sent to the node.

The screenshot shows CPU use when the request is sent 6 times.

NestedFieldAggregrationCPUUsage

Related component

Search:Aggregations

To Reproduce

  1. Provision an OpenSearch cluster, version must be 2.16.0 or higher.
  2. Create index

PUT /nestedfieldindex -d ' { "mappings": { "properties": { "nested1": { "properties": { "nested2": { "properties": { "nested1": { "properties": { "nested2": { "properties": { "field": { "type": "keyword", "fields": {} } }, "type": "nested" } }, "type": "nested" } }, "type": "nested" } }, "type": "nested" } } } }'

  1. Make search request

POST /nestedfieldindex/_search -d ' { "query": { "bool": { } }, "aggregations": { "group": { "nested": { "path": "nested1.nested2" }, "aggregations": { "group": { "composite": { "sources": [ { "group_key": { "terms": { "field": "" } } } ] }, "aggregations": { "count": { "nested": { "path": "nested1.nested2" } } } } } } } }'

  1. Observe the search request not completing and high CPU on at least 1 data node. Repeated attempts will see CPU use to increase to 100%

Expected behavior

It is expected that the search request would complete with the following result:

{ "took": 436, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "group": { "doc_count": 0, "group": { "buckets": [] } } } }

Additional Details

Plugins

Plugins enabled when testing with docker image. We have environments with very few plugins enabled and the issue persists, I don't think the issue is caused by a plugin.

opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql query-insights

Additional context

This bug appears to be introduced in OpenSearch 2.16.0. Our testing didn't encounter an issue in OpenSearch 1.x or before OpenSearch 2.16.0.

kkewwei commented 2 hours ago

@jameshinstac It may be solved by the #15931, can you please try the newest version?