opensearch-project / OpenSearch-Dashboards

📊 Open source visualization dashboards for OpenSearch.
https://opensearch.org/docs/latest/dashboards/index/
Apache License 2.0
1.61k stars 829 forks source link

[BUG] SPAM from bucket monitor when number of results over 50 #6710

Open NiFeuu opened 2 months ago

NiFeuu commented 2 months ago

Describe the bug

When a monitor is "by bucket" and the throttling is set to a value (60 minutes in my case), when the number of results from the monitor is over 50, the throotling time is no more respected, and the monitor execute the action every time it's runned For example, I have a monitor which check if a CPU is over 70% (value 0.7). When I have 49 servers with CPU over 70%, I have one message by host, and every 60 minutes. all is ok. But when there is 51 hosts over 70%, I have one message per minute (monitor scheduled every minutes)

To Reproduce The index i am using in this example is named metricbeat-ng. It index metrics from metricbeat 1 Create a monitor with execution executed every minute and a throttling time at 60 minutes. Set the action to send a message to something you can log (for example, an email destination) 2 The content of the monitor is :

{
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "{{period_end}}||-2m",
                            "to": "{{period_end}}",
                            "include_lower": true,
                            "include_upper": true,
                            "format": "epoch_millis",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {
        "composite_agg": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "host.hostname": {
                            "terms": {
                                "field": "host.hostname",
                                "missing_bucket": false,
                                "order": "asc"
                            }
                        }
                    }
                ]
            },
            "aggregations": {
                "avg_system_cpu_total_norm_pct": {
                    "avg": {
                        "field": "system.cpu.total.norm.pct"
                    }
                }
            }
        }
    }
}

3 The content of the trigger is :

{
    "buckets_path": {
        "avg_system_cpu_total_norm_pct": "avg_system_cpu_total_norm_pct"
    },
    "parent_bucket_path": "composite_agg",
    "script": {
        "source": "params.avg_system_cpu_total_norm_pct > 0.7",
        "lang": "painless"
    },
    "gap_policy": "skip"
}

4 Execute this script to send a cpu value of 0.75 (75%) every 10 seconds, for 60 hosts and during 5 minutes (30 *10 seconds)

for i in {0..30}; do
    for j in {1..60}; do
        json_content={\"@timestamp\":`date -u -d '+ '$((i*10))' seconds' +\"%Y-%m-%dT%H:%M:%S.000Z\"`,\"host.hostname\":\"Host-Test$((j))\",\"system.cpu.total.norm.pct\":0.75}
        echo $json_content
        curl -k -i -u 'smartpulse:#Smartpulse' -d $json_content -H 'Content-Type: application/json' -X POST https://localhost:9200/metricbeat-ng/_doc
    done
done

Expected behavior Throttling time is respected in any case

OpenSearch Version Opensearch 2.12

Dashboards Version Opensearch Dashboard 2.12

Plugins

Plugin Alerting

Screenshots

Host/Environment (please complete the following information):

Additional context

I have also noticed that over 500 buckets, the number of alerts increase at each execution of the monitor. Normaly it should stay the same (for example, for 510 hosts, the first minute it says 510 alerts, the second minute it says 520, the third 530...) This is just an example with metricbeat, but we have seen the same behaviour on other index

NiFeuu commented 2 months ago

Sorry, i've posted this issue in opensearch dashboard porject. Maybe it should go to the alerting section ?

kavilla commented 2 months ago

@NiFeuu . Thanks for opening!

@opensearch-project/admin please redirect to alerting dashboards repo