opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.05k stars 1.67k forks source link

[BUG] I get the error trying to order by the inner terms aggregation field. #14615

Closed Its-Ankush closed 20 hours ago

Its-Ankush commented 1 week ago

Describe the bug

Opensearch doesnt allow me to order by a nested aggregation field on the outer aggregation and throws out the Invalid aggregation order path [<inner_agg_name>]. Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.

Related component

Search:Aggregations

To Reproduce

Create an index and send values to it like this -

POST group_sort/_doc
{
  "country":"in",
  "city":"aa"
}

POST group_sort/_doc
{
  "country":"in",
  "city":"bb"
}

POST group_sort/_doc
{
  "country":"uk",
  "city":"bb"
}

POST group_sort/_doc
{
  "country":"uk",
  "city":"rr"
}

POST group_sort/_doc
{
  "country":"uk",
  "city":"qq"
}

Try to order your outer terms aggregation countries results in asc order based on the inner terms aggregation cities

GET group_sort/_search
{
  "size": 0,
  "aggs": {
    "countries": {
      "terms": {
        "field": "country.keyword",
        "order": {
          "cities": "asc"
        }
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "city.keyword"
          }
        }
      }
    }
  }
}

Error

{
          "type": "aggregation_execution_exception",
          "reason": "Invalid aggregation order path [cities]. Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end."
          }
        }

Expected behavior

The results should be sorted based on the inner terms aggregation field. Using a cardinality aggregation for the inner aggregation does work and I get expected results

GET group_sort/_search
{
  "size": 0,
  "aggs": {
    "countries": {
      "terms": {
        "field": "country.keyword",
        "order": {
          "cities": "asc"
        }
      },
      "aggs": {
        "cities": {
          "cardinality": {
            "field": "city.keyword"
          }
        }
      }
    }
  }
}

o/p

[
        {
          "key": "in",
          "doc_count": 2,
          "cities": {
            "value": 2
          }
        },
        {
          "key": "uk",
          "doc_count": 3,
          "cities": {
            "value": 3
          }
        }
      ]

Additional Details

Plugins n/a

Screenshots n/a

Host/Environment (please complete the following information):

Additional context I believe the custom comparator is the one which is doing to sorting so maybe changing something here might help

https://github.com/opensearch-project/OpenSearch/blob/6267e94b00d62c3fba80abf02ea584c85ea3aad0/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalMultiTerms.java#L201

Aim - can you please tell me why my aggregation is not working ? Is this a bug ? My main aim is to sort the results of the outer agg based on the values of the inner aggs.

TIA !

kkewwei commented 1 week ago

@Its-Ankush

My main aim is to sort the results of the outer agg based on the values of the inner aggs.'

cities is a terms aggregation, it's aggregation result contains multi buckets, can you describe what value of cities aggregation result do you want?

Its-Ankush commented 1 week ago

@kkewwei thank you so much for replying. So if I just modify the query to this -

GET group_sort/_search
{
  "size": 0,
  "aggs": {
    "countries": {
      "terms": {
        "field": "country.keyword"
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "city.keyword",
            "order": {
              "_key": "asc"
            }
          }
        }
      }
    }
  }
}

The output is -

[{
        "key": "uk",
        "doc_count": 3,
        "cities": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                    "key": "bb",
                    "doc_count": 1
                },
                {
                    "key": "qq",
                    "doc_count": 1
                },
                {
                    "key": "rr",
                    "doc_count": 1
                }
            ]
        }
    },
    {
        "key": "in",
        "doc_count": 2,
        "cities": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                    "key": "aa",
                    "doc_count": 1
                },
                {
                    "key": "bb",
                    "doc_count": 1
                }
            ]
        }
    }
]

Use case -

You see the way how the cities buckets is sorted in ascending order ? I would also like to sort the key which is countries to be sorted like that.

For uk, we have ["bb","qq","rr"] For in, we have ["aa","bb"]

If we take a look at this cities array, aa will come prior to bb as a result in should also appear above uk

Expected output -

[{
        "key": "in",
        "doc_count": 2,
        "cities": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                    "key": "aa",
                    "doc_count": 1
                },
                {
                    "key": "bb",
                    "doc_count": 1
                }
            ]
        }
    }, {
        "key": "uk",
        "doc_count": 3,
        "cities": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                    "key": "bb",
                    "doc_count": 1
                },
                {
                    "key": "qq",
                    "doc_count": 1
                },
                {
                    "key": "rr",
                    "doc_count": 1
                }
            ]
        }
    }

]

Please let me know if there is anything else I can share regarding this ?

bowenlan-amzn commented 5 days ago

@Its-Ankush then you just add the order by _key to first terms agg? not sure I get what you want correctly

"countries": {
      "terms": {
        "field": "country.keyword",
        "order": {
          "_key": "asc"
        }
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "city.keyword",
            "order": {
              "_key": "asc"
            }
          }
        }
      }
    }
Its-Ankush commented 5 days ago

@bowenlan-amzn using "_key": "asc" in the outer aggregation will only sort the countries based on the name of country.keyword but what I want is - sorting the same countries based on the arrangement of cities aggregation.

Lets say

Country City
us aaa,zab
in aaa,bba

In my aggs, i want the Country in to appear before us since if you see the values for City both have aaa but the second city for in is bba which ranks higher than zab lexicographically

bowenlan-amzn commented 5 days ago

This is not supported. I think it only supports sorting on a single value, not on a variable length of strings. That's why the error says you need final single-bucket or a metrics aggregation at the path end which produce a single value.

Its-Ankush commented 5 days ago

@bowenlan-amzn thank you for the information and the help. Please close this.