opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.43k stars 1.72k forks source link

[BUG] aggregation on nested field returns incorrect result #14347

Closed LantaoJin closed 2 months ago

LantaoJin commented 2 months ago

Describe the bug

Aggregation value_count on field of nested type returns incorrect result. This issue only happens on nested type, object type without nested looks fine.

Related component

Search:Aggregations

To Reproduce

Repro:

# create an test index with nested field
PUT test_nested_field_index
{
  "mappings" : {
    "properties": {
      "patients": { 
        "type" : "nested"
      }
    }
  }
}
# add some testing data
PUT test_nested_field_index/_doc/100
{ 
  "patients": [ 
    {"name" : "John Doe", "age" : 56, "smoker" : true},
    {"name" : "Mary Major", "age" : 85, "smoker" : false}
  ] 
}
# value_count should return 2 instead of 0
GET test_nested_field_index/_search
{
  "size": 0,
  "aggs": {
    "count_patients_name": {
      "value_count": {
        "field": "patients.name.keyword"
      }
    }
  }
}

Expected behavior

value_count should return 2 instead of 0

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

mch2 commented 2 months ago

possibly related - https://github.com/opensearch-project/OpenSearch/pull/13324 / https://github.com/opensearch-project/OpenSearch/pull/14208 / https://github.com/opensearch-project/OpenSearch/issues/13288

jed326 commented 2 months ago

@LantaoJin

Please correct me if my understanding here is wrong, but it looks like your agg is on the field patients.name.keyword but from the document it should just be patients.name.

Furthermore, when I try to do your value_count agg on just patients.name, I get the following:

"root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [patients.name] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
            }
        ]

Given that I don't think there are any problems here then, could you please double check?

jed326 commented 2 months ago

Also, I think in order to interact with nested fields in an aggregation, a nested aggregation must be used. See: https://opensearch.org/docs/latest/aggregations/bucket/nested/

kkewwei commented 2 months ago

@jed326, @mch2, I test this, it'ok, it seems not a problem.

GET test_nested_field_index/_search
{
   "aggs": {
      "patients1": {
         "nested": {
            "path": "patients"
         },
         "aggs": {
            "aggs1": {
               "value_count": {
                  "field": "patients.name.keyword"
               }
            }
         }
      }
   }
}
image
jed326 commented 2 months ago

Thanks @kkewwei! I also took at look at this, I think the problem is the nested aggregation was not used in the original query. I'm going to close this issue for now, @LantaoJin please re-open if there's any other issues here, thanks!