opensearch-project / alerting

đź“ź Get notified when your data meets certain conditions by setting up monitors, alerts, and notifications
https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/
Apache License 2.0
62 stars 102 forks source link

Alerts having Error: Failed fetching inputs: GeneralScriptException[Failed to compile inline script #706

Open divyankm opened 1 year ago

divyankm commented 1 year ago

OS Version: 1.2.0

Alerting Plugin version:1.2.0

We are having 100+ monitors in alerting plugin which are having trigger condition to check every one min.

Getting Alerts of error in .opendistro-alerting-alert* index.

Error:

Failed fetching inputs:
GeneralScriptException[Failed to compile inline script [{"size":0,"query":{"bool":{"filter":[{"range":{"event_timestamp":{"from":"{{period_end}}||-5m","to":"{{period_end}}","include_lower":true,"include_upper":true,"format":"epoch_millis","boost":1.0}}},{"terms":{"tag.id.keyword":["I_RM3201_RTD_09_3879d04a-c053-49d7-b0c3-07e559588261"],"boost":1.0}},{"terms":{"tag.name.keyword":["I_RM3201_RTD_09"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"I_RM3201_RTD_09":{"filter":{"term":{"tag.name.keyword":{"value":"I_RM3201_RTD_09","boost":1.0}}},"aggregations":{"I_RM3201_RTD_09_val":{"stats":{"field":"tag.value"}}}}}}] using lang [mustache]]; nested: CircuitBreakingException[[script] Too many dynamic script compilations within, max: [510/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting];; org.opensearch.common.breaker.CircuitBreakingException: [script] Too many dynamic script compilations within, max: [510/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting

I changed script.max_compilations_rate from 75/m to 510/m but still getting same error. Anything else that can be done to mitigate ths error?

Ref link: Understanding and fixing “too many script compilations” errors in Elasticsearch

Logs:

GET _cluster/settings
{
  "persistent" : {
    "script" : {
      "max_compilations_rate" : "520/5m"
    }
  },
  "transient" : {
    "script" : {
      "max_compilations_rate" : "510/5m"
    }
  }
}

GET /_nodes/stats?metric=script&filter_path=nodes.*.script.* 
{
  "nodes" : {
    "<...>" : {
      "script" : {
        "compilations" : 7880,
        "cache_evictions" : 7780,
        "compilation_limit_triggered" : 6232
      }

Sample Monitor:

{
  "_index": ".opendistro-alerting-config",
  "_type": "_doc",
  "_id": "VtZ4D4UBaGweITaE6WJ7",
  "_version": 1,
  "_score": 0,
  "_source": {
    "monitor": {
      "type": "monitor",
      "schema_version": 4,
      "name": "E2 Motor WS Brg temp 1_NDE RTD 065441",
      "monitor_type": "query_level_monitor",
      "user": {
        "name": "admin",
        "backend_roles": [
          "admin"
        ],
        "roles": [
          "own_index",
          "all_access"
        ],
        "custom_attribute_names": [],
        "user_requested_tenant": null
      },
      "enabled": true,
      "enabled_time": 1671001860473,
      "schedule": {
        "period": {
          "interval": 1,
          "unit": "MINUTES"
        }
      },
      "inputs": [
        {
          "search": {
            "indices": [
              "abc*"
            ],
            "query": {
              "size": 0,
              "query": {
                "bool": {
                  "filter": [
                    {
                      "range": {
                        "event_timestamp": {
                          "from": "{{period_end}}||-5m",
                          "to": "{{period_end}}",
                          "include_lower": true,
                          "include_upper": true,
                          "format": "epoch_millis",
                          "boost": 1
                        }
                      }
                    },
                    {
                      "terms": {
                        "tag.id.keyword": [
                          "I_66fd5c-06bb-43cc-b88c-fc66d528d031"
                        ],
                        "boost": 1
                      }
                    },
                    {
                      "terms": {
                        "tag.name.keyword": [
                          "<>"
                        ],
                        "boost": 1
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              },
              "aggregations": {
                "<>": {
                  "filter": {
                    "term": {
                      "tag.name.keyword": {
                        "value": "<>",
                        "boost": 1
                      }
                    }
                  },
                  "aggregations": {
                    "<>_val": {
                      "stats": {
                        "field": "tag.value"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      ],
      "triggers": [
        {
          "query_level_trigger": {
            "id": "VdZ4D4UBaGweITaE6WJ5",
            "name": "Temp >= 50",
            "severity": "1",
            "condition": {
              "script": {
                "source": "return ctx.results[0].aggregations.<>.<>_val.max == null ? false :(ctx.results[0].aggregations.<>.<>_val.max/10) >= 50",
                "lang": "painless"
              }
            },
            "actions": []
          }
        }
      ],
      "last_update_time": 1671001860473
    }
  },
  "fields": {
    "monitor.last_update_time": [
      "2022-12-14T07:11:00.473Z"
    ],
    "monitor.enabled_time": [
      "2022-12-14T07:11:00.473Z"
    ]
  }
}

Snaps:

image

image

lezzago commented 1 year ago

Have you tried to increase the max_compilations_rate to a very high number such as 52000/5m? Also this seems like it is a limitation on the OpenSearch cluster itself and the configuration of the cluster. Is there maybe a way where you can not run the monitors every minute and make sure the monitors' schedules are not all the same, so they dont all run at the same time during each execution?

divyankm commented 1 year ago

Have you tried to increase the max_compilations_rate to a very high number such as 52000/5m?

Also this seems like it is a limitation on the OpenSearch cluster itself and the configuration of the cluster.

  • Running single node Opensearch cluster in docker-container, version:1.2.0 . Any documentation of standard cluster parameters will help. In future forproductionenv, will be using multi node opensearch cluster usingdocker-swarm`.

Is there maybe a way where you can not run the monitors every minute and make sure the monitors' schedules are not all the same, so they don't all run at the same time during each execution? I fear this cannot be avoided, as currently, 100+ monitors are having trigger_interval of 1 min and scan interval of last 5 min. Actually, same monitor parameters we are looking to generate alerts. In Future, we will be adding 100+ monitors of the same monitor condition, which will definitely result in the overlapping running of multiple monitors at the same time during each execution.

I am not getting errors until only 30 monitors are enabled in above configuration. Post 30 monitors alerting engine unable to run monitors.

Ideally,Single Opensearch Cluster will have 500 monitors, of same monitor condition,need to refactor performance issue. Do let me know, any workaround can make it work.

Thanks for the acknowledgment.

lezzago commented 1 year ago

Ideally,Single Opensearch Cluster will have 500 monitors, of same monitor condition,need to refactor performance issue. Do let me know, any workaround can make it work.

Can you explain your use case? If they have the same monitor condition, why do you need 500 of the same monitor?

divyankm commented 1 year ago

Can you explain your use case? If they have the same monitor condition, why do you need 500 of the same monitor?

We have 100's of IOT Sensors/devices installed, giving real-time feeds, though the trigger condition of each device is similar, the only change will be the device name given to each monitor resulting in the creation of 100s of monitors.

Sample monitor condition attached above.

lezzago commented 1 year ago

I believe bucket level monitors fit your use case better and you can bucket the device name so you can have one monitor instead of 100s. documentation: https://opensearch.org/docs/latest/monitoring-plugins/alerting/monitors/#monitor-types