Open Sreevani871 opened 3 years ago
This line https://github.com/opendistro-for-elasticsearch/index-management/blob/v1.12.0.0/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt#L246 should be changed to state.sums = 0L; state.counts = 0L;
Ref: https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-literals.html#integer-literals Ref: https://github.com/elastic/elasticsearch/issues/27199
All the aggs which shows wrong value for avg
assumes sum
as 2147483647 and divide that by count
. Resulting in wrong values. This can be verified by multiplying the avg with count to arrive at this number(2147483647) for sum (For each wrong avg values in rolled up search)
Any help here @dbbaughe ? One more issue is with the delay field in rollup job configuration, When I configured the job with continuous field set true and delay field set to 300000(milliseconds), The execution of the job is not honouring the delay time. In code delay field type is defined as long. What time-unit does it get converted during execution?
Any help here?
Describe the bug Same Aggregation query is being fired on source index and rollup index for aggregation metric values comparision, Results are not matching. Average aggregation query on rollup index giving incorrect results.
Rollup Job Configuration
curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/rollup-test?pretty" -H "Content-Type:application/json" -d '{ "rollup": { "enabled": true, "schedule": { "cron": { "expression": "*/1 * * * *", "timezone":"UTC" } }, "description": "Test rollup job", "source_index": "jaeger-span-2021.04.17-000103", "target_index": "rollup-test", "page_size": 5000, "delay": 300, "continuous": false, "dimensions": [ { "date_histogram": { "source_field": "startTimeMillis", "fixed_interval": "1h", "timezone": "UTC" } }, { "terms": { "source_field": "process.serviceName" } }, { "terms": { "source_field": "process.tag.application@version" } }, { "terms": { "source_field": "operationName" } }, { "terms": { "source_field": "exception.type" } }, { "terms": { "source_field": "exception.message" } } ], "metrics": [ { "source_field": "duration", "metrics": [ { "avg": {} }, { "max": {} }, { "min": {} }, { "sum": {} }, { "value_count": {} } ] } ] } } '
Query on Rollup Index Requestcurl -X GET "localhost:9200/rollup-test/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'
Response rollup-index-response.txtQuery on Source Index Request
curl -X GET "localhost:9200/jaeger-span-2021.04.17-000103/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'
Response source-index-response.txt
Setup Details
All other metrics SUM, VALUE_COUNT, MIN, MAX are giving correct results and matching with aggregation metrics of source index. Only Average is giving incorrect results. Consider following example taken from response of Rollup index query:
{ "key_as_string" : "2021-04-17T02:00:00.000Z", "key" : 1618624800000, "doc_count" : 562, "service" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "service-xxxxxx", "doc_count" : 562, "avg_duration" : { "value" : 754.1463076048377 }, "count" : { "value" : 2847569 }, "min_duration" : { "value" : 37.0 }, "sum" : { "value" : 1.5818190941E10 }, "max_duration" : { "value" : 2.07551568E8 } } ] } }
Here the expected avg_duration: 1.5818190941E10 / 2847569 = 5,554.9807365511 but the actual value resulted in response is avg_duration = 754.1463076048377Can anyone explain the reason behind this discrepancy?