opensearch-project / alerting

📟 Get notified when your data meets certain conditions by setting up monitors, alerts, and notifications
https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/
Apache License 2.0
62 stars 102 forks source link

Alerting Monitors should allow per-second #28

Open adityaj1107 opened 3 years ago

adityaj1107 commented 3 years ago

Issue by Jon-AtAWS Monday Apr 01, 2019 at 21:27 GMT Originally opened as https://github.com/opendistro-for-elasticsearch/alerting/issues/20


Currently, setting by interval, I can choose minutes, hours, or days under Every. Please add seconds to this menu as well. There's no need to make me fiddle with a custom cron for a non-complicated, every 30 second (or every 10 second) monitor.

adityaj1107 commented 3 years ago

Comment by dbbaughe Tuesday Apr 02, 2019 at 18:57 GMT


Hi Jon,

We do not allow per-second monitors currently. The Alerting Elasticsearch plugin currently only allows monitors to run as frequently as every minute which is why the UI only shows minutes, hours, and days. I will move this issue to the Alerting Elasticsearch plugin repo as a feature request.

Thanks, Drew

adityaj1107 commented 3 years ago

Comment by ghost Sunday Apr 07, 2019 at 05:46 GMT


@Jon-AtAWS Can you elaborate a bit on what is your use case for per-second/10 secs/30 secs monitoring intervals?

adityaj1107 commented 3 years ago

Comment by Jon-AtAWS Monday Apr 08, 2019 at 21:50 GMT


If I'm monitoring something... security, infra, whatever, seconds can matter. I don't necessarily want to wait a minute, worst case, to find out my website went down.

Apart from that, it seems arbitrary to cut off at minutes. If I want to pay the cost of scaling to handle per-second queries, I should be able to do that. I haven't dug into cron, did we cut the seconds off of that?

adityaj1107 commented 3 years ago

Comment by ghost Tuesday Apr 09, 2019 at 04:36 GMT


We absolutely do want to support shorter intervals in the future but wanted to take a cautious approach to it after gathering user feedback on use-cases.

That said there are some valid reasons we elected to pick 1 minute as a good lower bound for the initial release:

  1. When the cluster topology changes it takes some time for the new state to be published and for a different host to take over running the monitors. The 1 minute minimum is chosen to minimize the chances of a missed run (equal to twice the default value of discovery.zen.publish_timeout which is 30s) . This doesn't completely address the failure mode but having monitors run at a much higher frequency can lead to duplicate/missing runs.
  2. In a multi user system with a large number of per-second monitors with inline scripts we were worried about running into the limit on script.max_compilations_rate which is currently 15/minute.
  3. It wasn't a completely arbitrary choice to stop at minutes: the traditional unix cron format doesn't support second level granularity either.

For the use cases you bring up (security, infra) I think we would rather investigate a model where the monitor is run in response to a different external input rather than a high frequency cron based polling.

brijos commented 2 years ago

Adding voice of the community from another discussion -

Currently, minimum time unit of OpenSearch scheduler plugin is minute. It can lead request throttling if many jobs (ISM, monitor) are registered. To solve throttling, customers need to set jitter or specify "second" on cron expression to distribute time for triggering each jobs.

Current design

Proposal design

Compete

Other tools support second unit for scheduling.