opensearch-project / OpenSearch-Dashboards

📊 Open source visualization dashboards for OpenSearch.
https://opensearch.org/docs/latest/dashboards/index/
Apache License 2.0
1.62k stars 835 forks source link

[BUG] Visualising time series of average value of rolled up index displays error #921

Open JanSvoboda opened 2 years ago

JanSvoboda commented 2 years ago

Creating time series visualisation of rolled up index - specifically average value displays error. Following error is displayed:

The request for this panel failed Cannot invoke "Object.getClass()" because "receiver" is null sum += a[0]; ^---- HERE

Snímek z 2021-11-08 23-59-40

Rolled up index is from metricbeat and rolled up index was created via dashboards gui.

Rollup jobs has aggreagation done on @timestamp and service-node.keyword (term). Aggregated metrics are:

all options are checked, ie min, max, sum, avg, value count

JSON payload generated by GUI to search indices is as follows: { "size": 0, "query": { "bool": { "must": [ { "range": { "@timestamp": { "gte": "2021-11-03T21:50:33.264Z", "lte": "2021-11-04T21:50:33.264Z", "format": "strict_date_optional_time" } } } ], "filter": [ { "match_all": {} } ], "should": [], "must_not": [] } }, "aggs": { "61ca57f1-469d-11e7-af02-69e470af7417": { "terms": { "field": "service-node.keyword", "order": { "_key": "desc" } }, "aggs": { "timeseries": { "date_histogram": { "field": "@timestamp", "min_doc_count": 0, "time_zone": "Europe/Prague", "extended_bounds": { "min": 1635976233264, "max": 1636062633264 }, "calendar_interval": "1m" }, "aggs": { "61ca57f2-469d-11e7-af02-69e470af7417": { "avg": { "field": "system.cpu.system.norm.pct" } } } } }, "meta": { "timeField": "@timestamp", "intervalString": "1m", "bucketSize": 60, "seriesId": "61ca57f1-469d-11e7-af02-69e470af7417" } } }, "timeout": "30000ms" }

opensearch response - json payload: { "error": { "root_cause": [], "type": "search_phase_execution_exception", "reason": "", "phase": "fetch", "grouped": true, "failed_shards": [], "caused_by": { "type": "script_exception", "reason": "runtime error", "script_stack": [ "sum += a[0]; ", "^---- HERE" ], "script": "double sum = 0; double count = 0; for (a in states) { sum += a[0]; count += a[1]; } return sum/count", "lang": "painless", "position": { "offset": 54, "start": 54, "end": 67 }, "caused_by": { "type": "null_pointer_exception", "reason": "Cannot invoke \"Object.getClass()\" because \"receiver\" is null" } } }, "status": 400 }

JSON of rollup job:

{ "_id": "rrrr", "_seqNo": 367661, "_primaryTerm": 1, "rollup": { "rollup_id": "rrrr", "enabled": false, "schedule": { "interval": { "start_time": 1636062376706, "period": 1, "unit": "Minutes" } }, "last_updated_time": 1636062376706, "enabled_time": null, "description": "", "schema_version": 11, "source_index": "rollup-testing-100*", "target_index": "rollup-testing-201", "metadata_id": "q3nr7HwB_CHlSCQynRlm", "page_size": 1000, "delay": 0, "continuous": false, "dimensions": [ { "date_histogram": { "fixed_interval": "1m", "source_field": "@timestamp", "target_field": "@timestamp", "timezone": "UTC" } }, { "terms": { "source_field": "service-node.keyword", "target_field": "service-node.keyword" } } ], "metrics": [ { "source_field": "system.cpu.system.norm.pct", "metrics": [ { "min": {} }, { "max": {} }, { "sum": {} }, { "avg": {} }, { "value_count": {} } ] }, { "source_field": "system.cpu.user.norm.pct", "metrics": [ { "min": {} }, { "max": {} }, { "sum": {} }, { "avg": {} }, { "value_count": {} } ] } ] }, "metadata": { "rrrr": { "metadata_id": "q3nr7HwB_CHlSCQynRlm", "rollup_metadata": { "rollup_id": "rrrr", "last_updated_time": 1636062440221, "status": "finished", "failure_reason": null, "stats": { "pages_processed": 10, "documents_processed": 1518881, "rollups_indexed": 8346, "index_time_in_millis": 809, "search_time_in_millis": 1958 } } } } }

To Reproduce Steps to reproduce the behavior:

  1. Create metricbeat index
  2. Populate index
  3. Create rollup job with aggregations based on @timestamp, custom field - term and aggreagate two metrics:
    • system.cpu.system.norm.pct
    • system.cpu.user.norm.pct
  4. Execute rollup job and create rolled up index.
  5. Create timeseries visualisation
    • in panel options as index pattern write name of rolledup index
    • in data as aggregation select average, field either system.cpu.system.norm.pct or system.cpu.user.norm.pct
    • group by terms - select custom field used for rollup job creation in step 3
    • set timeframe where index contains any data
  6. See error

Expected behavior Average values are displayed aggregated per custom field, as is working for, count, min, max etc

OpenSearch Version Opnesearch version 1.1.0

Dashboards Version 1.1.0

Plugins

Please list all plugins currently enabled.

Screenshots

If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

manasvinibs commented 1 year ago

Hi @JanSvoboda I'm trying to reproduce this issue. I'm new to using TSVB Visualization and metricbeat. So having difficulty reproducing the issue. Would appreciate your help in understanding the issue. Firstly, are you still experiencing this issue as its been quite sometime since you have created this? If issue still persists, can you provide detailed steps describing how to create metricbeat index? How to Populate the index? How to create roll up jobs using the index?

BSFishy commented 1 year ago

@JanSvoboda can you provide a response to the previous question?

lexxxel commented 1 year ago

@manasvinibs and @BSFishy I do have the same issue. I found out, that this issue is not limited to TSVB, but also normal line plots. Basically, all you need is some kind of index (in my example I call it cpu) containing a:

image (I rollup only cpu_p, but it is equal to cpu0.p_cpu in the image) I use fluent-bit to collect cpu stats for that (the 'Host' is from the fluent-bit filter 'modify' which adds Host $(HOSTNAME)). I can give you a minimum example config file, if required.

Back to the issue: Now create any rollup job that allows you to replicate the following visualization: image

I created a simular rollup job like @JanSvoboda : image With the following Index: image

Let me explain the bug first in the line chart:

  1. open the line chart

  2. add buckets -> split series -> terms -> field: Host.keyword

  3. press update and get a working plot: image

  4. switch metrics->(Aggregation: Avergage; Field: cpu_p) and get an error: image

    Error: Internal Server Error
      at Fetch._callee3$ (https://xxx.dev:5601/5367/bundles/core/core.entry.js:15:584612)
      at tryCatch (https://xxx.dev:5601/5367/bundles/plugin/queryWorkbenchDashboards/queryWorkbenchDashboards.plugin.js:2:2179)
      at Generator._invoke (https://xxx.dev:5601/5367/bundles/plugin/queryWorkbenchDashboards/queryWorkbenchDashboards.plugin.js:2:1802)
      at Generator.next (https://xxx.dev:5601/5367/bundles/plugin/queryWorkbenchDashboards/queryWorkbenchDashboards.plugin.js:2:2954)
      at fetch_asyncGeneratorStep (https://xxx.dev:5601/5367/bundles/core/core.entry.js:15:577704)
      at _next (https://xxx.dev:5601/5367/bundles/core/core.entry.js:15:578020)

    the server will print the following error:

    [2023-02-25T23:00:17,140][WARN ][r.suppressed             ] [xxx.dev] path: /rollup_cpu_4/_search, params: {ignore_unavailable=true, preference=1677362860452, index=rollup_cpu_4, timeout=30000ms, track_total_hits=true}
    org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:663) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:372) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:698) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:471) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:294) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:753) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.transport.TransportService$6.handleException(TransportService.java:794) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:312) [opensearch-security-2.5.0.0.jar:2.5.0.0]
    at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1414) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1528) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1502) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:79) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:71) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) [opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.5.0.jar:2.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
    Caused by: org.opensearch.search.aggregations.AggregationExecutionException: Invalid aggregation order path [1]. Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.
    at org.opensearch.search.aggregations.InternalOrder$Aggregation.partiallyBuiltBucketComparator(InternalOrder.java:98) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.InternalOrder$CompoundOrder.lambda$partiallyBuiltBucketComparator$0(InternalOrder.java:204) ~[opensearch-2.5.0.jar:2.5.0]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
    at java.util.LinkedList$LLSpliterator.forEachRemaining(LinkedList.java:1242) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
    at org.opensearch.search.aggregations.InternalOrder$CompoundOrder.partiallyBuiltBucketComparator(InternalOrder.java:205) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:219) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:70) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:118) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:503) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$1.build(TermsAggregatorFactory.java:140) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:311) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:76) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:101) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:278) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:68) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:151) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.lambda$loadIntoContext$24(IndicesService.java:1677) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.lambda$cacheShardLevelResult$25(IndicesService.java:1736) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:201) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:184) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.cache.Cache.computeIfAbsent(Cache.java:461) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:151) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1742) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.loadIntoContext(IndicesService.java:1676) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:528) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:594) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:563) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.5.0.jar:2.5.0]
    ... 8 more
    Caused by: java.lang.IllegalArgumentException: Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.
    at org.opensearch.search.aggregations.Aggregator.bucketComparator(Aggregator.java:153) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.support.AggregationPath.bucketComparator(AggregationPath.java:244) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.InternalOrder$Aggregation.partiallyBuiltBucketComparator(InternalOrder.java:95) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.InternalOrder$CompoundOrder.lambda$partiallyBuiltBucketComparator$0(InternalOrder.java:204) ~[opensearch-2.5.0.jar:2.5.0]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
    at java.util.LinkedList$LLSpliterator.forEachRemaining(LinkedList.java:1242) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
    at org.opensearch.search.aggregations.InternalOrder$CompoundOrder.partiallyBuiltBucketComparator(InternalOrder.java:205) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:219) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:70) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:118) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:503) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$1.build(TermsAggregatorFactory.java:140) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:311) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:76) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:101) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:278) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:68) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:151) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.lambda$loadIntoContext$24(IndicesService.java:1677) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.lambda$cacheShardLevelResult$25(IndicesService.java:1736) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:201) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:184) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.cache.Cache.computeIfAbsent(Cache.java:461) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:151) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1742) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.indices.IndicesService.loadIntoContext(IndicesService.java:1676) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:528) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:594) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:563) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.5.0.jar:2.5.0]
    ... 8 more
  5. switch the Order by to Alphabetical and update -> now it works (you could also switch to custom metric -> Aggregation: Sum)

  6. add sub-bucket: x-axis -> sub aggregation: date histogram and get the final visualization: image


Now about the TSVB:

  1. switch to Panel options and configure the Data panel: image
  2. switch to Data and configure: Group by: Terms && By: Host.keyword keep in mind that Aggregation still stands on Count: image
  3. switch Aggregation: Average: image this time, there is no server error. This is the same bug @JanSvoboda had.
  4. optional: switch Order by to Àverage of cpu_pto trigger the same server error like for the line chart and get aThe request for this panel failed` error message instead of the sum one
sharathganga commented 1 month ago

@manasvinibs I'm facing the same issue using AWS OpenSearch 2.11.

Steps to reproduce:

I use OTEL collector and Fluentd to collect metrics for k8s pods and then write to an index called k8s-metrics that is being managed by ISM policy that does the rollover. I've updated the existing policy to create a rollup index with 1h of data rolled up and this index contains specific terms, aggregation fields that I'm interested in (like CPU, memory, network) and the Average metric for those fields.

ISM Policy:

{
    "id": "k8s-metrics-rollover-and-delete-old-indexes",
    "seqNo": 5874120,
    "primaryTerm": 1,
    "policy": {
        "policy_id": "k8s-metrics",
        "last_updated_time": 1719219896732,
        "schema_version": 19,
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "rollover": {
                            "min_index_age": 1d",
                            "copy_alias": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "warm",
                        "conditions": {
                            "min_index_age": "0d"
                        }
                    }
                ]
            },
            {
                "name": "warm",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "rollup": {
                            "ism_rollup": {
                                "description": "k8s-metrics-rollup-job-1h",
                                "target_index": "k8s-metrics-rollup-1h",
                                "page_size": 1000,
                                "dimensions": [
                                    {
                                        "date_histogram": {
                                            "fixed_interval": "1h",
                                            "source_field": "@timestamp",
                                            "target_field": "@timestamp",
                                            "timezone": "UTC",
                                            "format": null
                                        }
                                    },
                                    {
                                        "terms": {
                                            "source_field": "k8s.application.name.keyword",
                                            "target_field": "k8s.application.name.keyword"
                                        }
                                    },
                                    {
                                        "terms": {
                                            "source_field": "k8s.deployment.name.keyword",
                                            "target_field": "k8s.deployment.name.keyword"
                                        }
                                    },
                                    {
                                        "terms": {
                                            "source_field": "k8s.pod.name.keyword",
                                            "target_field": "k8s.pod.name.keyword"
                                        }
                                    },
                                    {
                                        "terms": {
                                            "source_field": "k8s.namespace.name.keyword",
                                            "target_field": "k8s.namespace.name.keyword"
                                        }
                                    }
                                ],
                                "metrics": [
                                    {
                                        "source_field": "k8s.pod.cpu.usage",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.cpu_limit",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.cpu_request",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.filesystem.usage",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.memory.working_set",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.memory_limit",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.memory_request",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.network.errors",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.network.io",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    },
                                    {
                                        "source_field": "k8s.pod.phase",
                                        "metrics": [
                                            {
                                                "avg": {}
                                            }
                                        ]
                                    }
                                ]
                            }
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "30d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [
            {
                "index_patterns": [
                    "k8s-metrics-*"
                ],
                "priority": 100,
                "last_updated_time": 1718171221324
            }
        ]
    }
}

Once the rolled up index k8s-metrics-rollup-1h is created, go to OpenSearch Dashboards -> Click Visualize-> Click Create visualization -> TSVB

When I select the Average aggregation for any metric (for example k8s.pod.memory.working_set, it gives me the same error reported in this bug.

image

OpenSearch Response:

"61ca57f0-469d-11e7-af02-69e470af7417": {
    "id": "61ca57f0-469d-11e7-af02-69e470af7417",
    "statusCode": 400,
    "error": {
        "error": {
            "root_cause": [],
            "type": "search_phase_execution_exception",
            "reason": "",
            "phase": "fetch",
            "grouped": true,
            "failed_shards": [],
            "caused_by": {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "sum += a[0]; ",
                    "^---- HERE"
                ],
                "script": "double sum = 0; double count = 0; for (a in states) { sum += a[0]; count += a[1]; } return sum/count",
                "lang": "painless",
                "position": {
                    "offset": 54,
                    "start": 54,
                    "end": 67
                },
                "caused_by": {
                    "type": "null_pointer_exception",
                    "reason": "Cannot invoke \"Object.getClass()\" because \"receiver\" is null"
                }
            }
        },
        "status": 400
    },
    "series": []
}

NOTE: I can still use the regular indexes to create visualizations but it is the rollup index that gives me this issue.

I have large volumes of metric data coming into OpenSearch, and the inability to roll up that data periodically makes OpenSearch practically unusable. A quick fix would be highly appreciated. Please let me know if you need any more information.

dblock commented 3 weeks ago

[Catch All Triage, attendees 1, 2, 3, 4, 5, 6, 7]