opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.55k stars 1.75k forks source link

[BUG] Range aggregation ignores query #15169

Closed marko-bekhta closed 1 month ago

marko-bekhta commented 1 month ago

Describe the bug

Range aggregation ignores query. This is working ok in 2.15.0 and started to fail in the recent 2.16.0.

Related component

Search:Aggregations

To Reproduce

This problem was initially noticed with a routing filter, but it can also be reproduced with other queries, e.g., a range query.

  1. Create a simple index
    
    PUT http://localhost:32772/range-agg-test-000001
    Content-Type: application/json

{ "mappings": { "dynamic": "strict", "properties": { "Integer": { "type": "integer" } } } }

2. add a few documents to the index:
```http
POST http://localhost:32772/range-agg-test-000001/_doc/1?routing=route1
Content-Type: application/json

{
  "Integer": -2147483648
}
POST http://localhost:32772/range-agg-test-000001/_doc/2?routing=route1
Content-Type: application/json

{
  "Integer": -2147483648
}
POST http://localhost:32772/range-agg-test-000001/_doc/200?routing=route2
Content-Type: application/json

{
  "Integer": -2147483648
}

3.1. Try to get a range aggregation and use the filter on a route:

POST http://localhost:32772/range-agg-test-000001/_search
Content-Type: application/json

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": [
        {
          "terms": {
            "_routing": [
              "route1"
            ]
          }
        }
      ]
    }
  },
  "aggregations": {
    "aggregationName": {
      "range": {
        "field": "Integer",
        "keyed": true,
        "ranges": [
          {
            "to": 0,
            "key": "0"
          },
          {
            "from": 0,
            "key": "1"
          }
        ]
      }
    }
  },
  "_source": false
}

3.2. Alternatively, try a simple > query instead:

POST http://localhost:32768/range-agg-test-000001/_search
Content-Type: application/json

{
  "query": {
    "range": {
      "Integer": {
        "gte": 10
      }
    }
  },
  "aggregations": {
    "aggregationName": {
      "range": {
        "field": "Integer",
        "keyed": true,
        "ranges": [
          {
            "to": 0,
            "key": "0"
          },
          {
            "from": 0,
            "key": "1"
          }
        ]
      }
    }
  },
  "_source": false
}

In both cases, the result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "aggregationName": {
      "buckets": {
        "0": {
          "to": 0.0,
          "doc_count": 3
        },
        "1": {
          "from": 0.0,
          "doc_count": 0
        }
      }
    }
  }
}

Expected behavior

It would be expected to get

Additional Details

Plugins

ec6d9b89d3cf opensearch-alerting                  2.16.0.0
ec6d9b89d3cf opensearch-anomaly-detection         2.16.0.0
ec6d9b89d3cf opensearch-asynchronous-search       2.16.0.0
ec6d9b89d3cf opensearch-cross-cluster-replication 2.16.0.0
ec6d9b89d3cf opensearch-custom-codecs             2.16.0.0
ec6d9b89d3cf opensearch-flow-framework            2.16.0.0
ec6d9b89d3cf opensearch-geospatial                2.16.0.0
ec6d9b89d3cf opensearch-index-management          2.16.0.0
ec6d9b89d3cf opensearch-job-scheduler             2.16.0.0
ec6d9b89d3cf opensearch-knn                       2.16.0.0
ec6d9b89d3cf opensearch-ml                        2.16.0.0
ec6d9b89d3cf opensearch-neural-search             2.16.0.0
ec6d9b89d3cf opensearch-notifications             2.16.0.0
ec6d9b89d3cf opensearch-notifications-core        2.16.0.0
ec6d9b89d3cf opensearch-observability             2.16.0.0
ec6d9b89d3cf opensearch-performance-analyzer      2.16.0.0
ec6d9b89d3cf opensearch-reports-scheduler         2.16.0.0
ec6d9b89d3cf opensearch-security                  2.16.0.0
ec6d9b89d3cf opensearch-security-analytics        2.16.0.0
ec6d9b89d3cf opensearch-skills                    2.16.0.0
ec6d9b89d3cf opensearch-sql                       2.16.0.0
ec6d9b89d3cf query-insights                       2.16.0.0

Host/Environment (please complete the following information):

Additional context Running OpenSearch using the official opensearchproject/opensearch:2.16.0 image.

harshavamsi commented 1 month ago

Haven't tried to reproduce yet, but most likely due to #13865

@bowenlan-amzn can you take a look?

marko-bekhta commented 1 month ago

Thanks for taking a look. Yeah, I was looking at the change log, and the https://github.com/opensearch-project/OpenSearch/pull/13865 also caught my eye.

By the way, a question not entirely related to the issue at hand: do you happen to have a snapshot build of OpenSearch that we could run tests against? Ideally, it would be a snapshot docker image, so it'll be easier to integrate the testing into our CI. That'd help catch this kind of issues before the release..

harshavamsi commented 1 month ago

I could repro this, in both cases we get

    "aggregations": {
        "aggregationName": {
            "buckets": {
                "0": {
                    "to": 0.0,
                    "doc_count": 3
                },
                "1": {
                    "from": 0.0,
                    "doc_count": 0
                }
            }
        }
    }

As for the snapshots, I know we have them here -- https://aws.oss.sonatype.org/content/repositories/snapshots/org/opensearch/opensearch/

dennisoelkers commented 1 month ago

I can also verify that this is happening when using a date_range aggregation. All filters of the query are ignored when it is used in 2.16.0.

bowenlan-amzn commented 1 month ago

@marko-bekhta @dennisoelkers sorry for the late reply. Please try set this cluster setting search.max_aggregation_rewrite_filters to 0 to disable this filter rewrite optimization, it should fall back to the default aggregation path.

getsaurabh02 commented 1 month ago

@dennisoelkers can we confirm if disabling the dynamic setting search.max_aggregation_rewrite_filters helps with mitigating the issue here?

drewmiranda-gl commented 1 month ago

@getsaurabh02 I've just tested and it does appear to properly return 0:

Compare (first item is more recent, after applying search.max_aggregation_rewrite_filters: 0. Second item is before setting search.max_aggregation_rewrite_filters)

image

I'll defer to Dennis for his expertise though.

drewmiranda-gl commented 1 month ago

So far we've gotten 2 (3 counting my test above) positive acknowledgments that the workaround does resolve the issue. 🙌

This is great news.

dblock commented 1 month ago

Maybe someone on this thread could add some agg examples to https://github.com/opensearch-project/opensearch-api-specification/tree/main/tests and make sure we have those specs? We have lots of users asking for agg support in clients (example)

dennisoelkers commented 1 month ago

@getsaurabh02: I can verify that it the setting is fixing for us now, results are as expected again, at least for the cases that I tested (using a date_range aggregation).