opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.76k stars 1.82k forks source link

[BUG] Deep level aggregations query hang the request #15914

Closed Roboteus closed 1 month ago

Roboteus commented 1 month ago

Describe the bug

Application in version 2.16.0 has a bug which is manifested by hanging while trying to resolve the request - wait for unlimited amount of time. The problem is only for specific query which is included to this bug ticket below. Version 1.3.19 handling the query

Related component

Search

To Reproduce

  1. Create index:
    PUT supplier2/
    {
    "settings": {
    "number_of_shards": 1,
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      },
      "normalizer": {
        "raw_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
    },
    "mappings": {
    "properties": {
      "dataSet": {
        "type": "keyword"
      },
      "supplierValueProperties": {
        "type": "nested",
        "properties": {
          "propertyName": {
            "type": "keyword"
          },
          "propertyType": {
            "type": "keyword"
          },
          "propertyStringValue": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              },
              "raw_normalized": {
                "type": "keyword",
                "normalizer": "raw_normalizer"
              }
            }
          },
          "propertyNumericValue": {
            "type": "double"
          },
          "propertyDateValue": {
            "type": "date",
            "format": "yyyy-MM-dd"
          },
          "propertyBooleanValue": {
            "type": "boolean"
          }
        }
      },
      "references": {
        "type": "nested",
        "properties": {
          "key": {
            "type": "keyword"
          },
          "value": {
            "type": "nested",
            "properties": {
              "id": {
                "type": "keyword"
              },
              "code": {
                "type": "text",
                "fields": {
                  "raw": {
                    "type": "keyword"
                  },
                  "raw_normalized": {
                    "type": "keyword",
                    "normalizer": "raw_normalizer"
                  }
                }
              },
              "referenceValueProperties": {
                "type": "nested",
                "properties": {
                  "propertyName": {
                    "type": "keyword"
                  },
                  "propertyType": {
                    "type": "keyword"
                  },
                  "propertyStringValue": {
                    "type": "text",
                    "fields": {
                      "raw": {
                        "type": "keyword",
                        "ignore_above": 30000
                      },
                      "raw_normalized": {
                        "type": "keyword",
                        "normalizer": "raw_normalizer",
                        "ignore_above": 30000
                      }
                    }
                  },
                  "propertyNumericValue": {
                    "type": "double"
                  },
                  "propertyDateValue": {
                    "type": "date",
                    "format": "yyyy-MM-dd"
                  },
                  "propertyBooleanValue": {
                    "type": "boolean"
                  }
                }
              }
            }
          }
        }
      }
    }
    }
    }
  2. Execute the query:
    POST supplier2/_search
    {
    "size": 1000,
    "query": {
    "bool": {
      "must": [
        {
          "term": {
            "dataSet": {
              "value": "basic",
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
    },
    "aggregations": {
    "reference_aggregation": {
      "nested": {
        "path": "references"
      },
      "aggregations": {
        "references.key": {
          "terms": {
            "field": "references.key"
          },
          "aggregations": {
            "referenceValueProperties": {
              "nested": {
                "path": "references.value.referenceValueProperties"
              },
              "aggregations": {
                "propertyName": {
                  "terms": {
                    "field": "references.value.referenceValueProperties.propertyName"
                  },
                  "aggregations": {
                    "propertyType": {
                      "terms": {
                        "field": "references.value.referenceValueProperties.propertyType"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    }
    }

Expected behavior

request resolved, like in this version: "version" : { "distribution" : "opensearch", "number" : "1.3.19", "build_type" : "zip", "build_date" : "2024-08-23T00:39:31.484729800Z", "build_snapshot" : false, "lucene_version" : "8.10.1", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }

Additional Details

default installation, for example in Windows (not related to OS): "version" : { "distribution" : "opensearch", "number" : "2.16.0", "build_type" : "zip", "build_date" : "2024-08-06T20:32:32.086481300Z", "build_snapshot" : false, "lucene_version" : "9.11.1", "minimum_wire_compatibility_version" : "7.10.0", "minimum_index_compatibility_version" : "7.0.0" }

kkewwei commented 1 month ago

@Roboteus, it seems to be related to #13324, I will find the reason and fix it as soon as possible.

reta commented 1 month ago

The issue seems to be not fixed [1], the branch in question included the supposed fix:

java.lang.RuntimeException: Failure at [search.aggregation/410_nested_aggs:62]: 60000 MILLISECONDS
    at __randomizedtesting.SeedInfo.seed([352C24D70857A0DA:BD781B0DA6ABCD22]:0)
    at org.opensearch.test.rest.yaml.OpenSearchClientYamlSuiteTestCase.executeSection(OpenSearchClientYamlSuiteTestCase.java:462)
    at org.opensearch.test.rest.yaml.OpenSearchClientYamlSuiteTestCase.test(OpenSearchClientYamlSuiteTestCase.java:433)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.base/java.lang.Thread.run(Thread.java:1583)

[1] https://build.ci.opensearch.org/job/gradle-check/48115/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_search_aggregation_410_nested_aggs_Supported_queries__3/

msfroh commented 1 month ago

The issue seems to be not fixed [1], the branch in question included the supposed fix:

@reta -- I notice that it's failing on a MixedClusterClientYamlTestSuiteIT. Do you think it might be a result of the old 2.18 node that doesn't have the fix yet?

I noticed the skip setting is:

"Supported queries":
  - skip:
      version: " - 2.17.99"
      reason: "fixed in 2.18.0"

Maybe that should be - 2.99.99 until we backport the fix to 2.x.

reta commented 1 month ago

The issue seems to be not fixed [1], the branch in question included the supposed fix:

@reta -- I notice that it's failing on a MixedClusterClientYamlTestSuiteIT. Do you think it might be a result of the old 2.18 node that doesn't have the fix yet?

I noticed the skip setting is:

"Supported queries":
  - skip:
      version: " - 2.17.99"
      reason: "fixed in 2.18.0"

Maybe that should be - 2.99.99 until we backport the fix to 2.x.

Could be it since no backports happened, thanks @msfroh

kkewwei commented 1 month ago

backports @reta, If we should change the skip like this:

"Supported queries":
  - skip:
      version: " - 2.99.99"
      reason: "fixed in 3.0.0"

This case seems happen a bit high frequently.(https://build.ci.opensearch.org/job/gradle-check/48217/)

reta commented 1 month ago

@kkewwei if this is bwc issue, the backport to 2.x should fix it, could you please backport manually (if it makes sense) since auto backport failed https://github.com/opensearch-project/OpenSearch/pull/15931#issuecomment-2360398545. Thank you