opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.69k stars 1.79k forks source link

[BUG] Invalid Field Names in Metric Aggregation Queries that use star tree returns 500 Internal Server Error #16473

Open expani opened 20 hours ago

expani commented 20 hours ago

Describe the bug

When running Metric aggregation queries over indices that have star tree enabled, using invalid field names in the query leads to a 500 Internal Server Error.

Related component

Search

Reproduction Steps

Checkout the main branch of OpenSearch.

We need to enable the star tree feature flag Add this line in OpenSearchNode#createConfiguration() which is the default config used with gradlew run.

baseConfig.put("opensearch.experimental.feature.composite_index.star_tree.enabled", "true");

Build the checkout and run the server ./gradlew run

Enable the Cluster setting indices.composite_index.star_tree.enabled for indexing

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'{
  "persistent" : {
    "indices.composite_index.star_tree.enabled" : "true"
  }
}'

Create Index

curl -XPUT -H'Content-type: application/json' -kv localhost:9200/logs -d '{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 0,
    "index.composite_index": true
  },
  "mappings": {
    "composite": {
      "startree1": {
        "type": "star_tree",
        "config": {
          "ordered_dimensions": [
            {
              "name": "status"
            },
            {
              "name": "port"
            }
          ],
          "metrics": [
            {
              "name": "size",
              "stats": [
                "sum"
              ]
             },
             {
              "name": "latency",
              "stats": [
                "avg"
              ]
            }
          ]
        }
      }
    },
    "properties": {
      "status": {
        "type": "integer"
      },
      "port": {
        "type": "integer"
      },
      "size": {
        "type": "integer"
      },
      "latency": {
        "type": "scaled_float",
        "scaling_factor": 10
      }
    }
  }
}'

Run a search query containing an invalid field name. request_size in this case.

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/logs/_search -d '{
  "size": 0,
  "aggs": {
    "sum_request_size": {
      "sum": {
        "field": "request_size"
      }
    }
  }
}'

A 500 Internal Server Error is thrown with the below exception seen in logs.

[2024-10-24T19:56:56,719][WARN ][r.suppressed             ] [runTask-0] path: /logs/_search, params: {index=logs}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:775) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:815) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:316) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:760) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1719) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1505) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1619) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1593) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:75) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:982) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:1570) [?:?]
Caused by: org.opensearch.OpenSearchException$3: Cannot invoke "org.opensearch.search.aggregations.support.FieldContext.field()" because the return value of "org.opensearch.search.aggregations.support.ValuesSourceConfig.fieldContext()" is null
        at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:393) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        ... 23 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.opensearch.search.aggregations.support.FieldContext.field()" because the return value of "org.opensearch.search.aggregations.support.ValuesSourceConfig.fieldContext()" is null
        at org.opensearch.search.aggregations.support.ValuesSourceAggregatorFactory.getField(ValuesSourceAggregatorFactory.java:107) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.compositeindex.datacube.startree.utils.StarTreeQueryHelper.validateStarTreeMetricSupport(StarTreeQueryHelper.java:153) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.compositeindex.datacube.startree.utils.StarTreeQueryHelper.getStarTreeQueryContext(StarTreeQueryHelper.java:77) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.parseSource(SearchService.java:1550) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.createContext(SearchService.java:1108) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:705) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:678) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        ... 8 more

Expected behavior

400 Bad Request should be returned with appropriate error message.

expani commented 19 hours ago

Field Context is retrieved from the Shard Context mapper here and remains null if it's not found.

If we handle null check here and throw appropriate exception here I think the issue can be solved.

Let me know your thoughts on this. @bharath-techie @sandeshkr419 I

expani commented 14 hours ago

The above checks fixed the issue. But, there are other issues like validation of field name present in query against metrics and dimensions (CompositeDataCubeFieldType ) to ensure it can make use of star tree index.

Will take a deeper look into it later.

expani commented 1 hour ago

There is no error in the query when Star Tree feature flag is disabled. It just returns empty hits.

{"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]},"aggregations":{"average_latency":{"value":null},"sum_request_size":{"value":0.0}}}

So, we were never validating fields used in aggregation against the actual fields of index. Most of the existing aggregation queries without star tree have the same issue.

I was able to manage the same response for metric aggregation queries with these changes.