Closed IlliciteS closed 1 year ago
It may be related to the following movements. In the next version, I would like to change to add settings from the screen with Praeco. https://elastalert2.readthedocs.io/en/latest/elastalert.html
Try to solve by adding to BaseRule.config
disable_rules_on_error: false
I added that into the BaseRule.config (and nothing else, just add it) but it does not work. That being said, we found a curious workaround:
If we go to Praeco and, for a frozen alarm, Edit -> Disable the "Limit Excecution", wait a bit, then the alarm will be "re enable". Then we can re-enable the Limit Execution and the alarm is still fine.
It also works through the yaml. So we plan to make a script which will add a # to comment the limite execution into the yaml for all the alarms, and then, like 2 min later, will delete that # to uncomment the limite execution.
Note: we update to your latest version of Praeco and Elastalert, and that workaround still works.
There is a comment that implements a function that only limits the execution of rules to a specific time of the day, rather than disabling alerts.
https://github.com/Yelp/elastalert/issues/492#issuecomment-438024625
I've merged this feature into a new branch, beta, and released it as a new package version 0.2.0b1 available on pypi.
This includes a couple other changes as well, like threading support, but you can now limit rule execution to certain times of the day using limit_execution using cron syntax. For example
limit_execution: " 7-22 " Would mean to only run the rule between 7 am and 10 pm every day.
This feature is still in beta, of course, but you're welcome to try.
Yes, we are already using the limit_execution. To be clear, this:
Which equals, in yaml, to:
If I am not mistaken, right?
And so it's this feature that creates an issue (for us, at least). And we do not disable the alarm, but this feature, to make the alarms work again.
So the script we will implement will do that: First, it will comment the limit_execution in the yaml:
And the it will delete this #, ie, it will uncomment that feature so this feature works again: limit_execution: "0 /1 " The alarms always stay enable.
Before, the alarms were "frozen". The Query Log showed either "no data" or an old date. After this work around, the alarms are not frozen anymore, and work -> The query log tab display all the queries made by the alarms.
By the way, maybe we do not use the Limit Execution the proper way. We use it to run a query evey 5 min, or every one hour, for instance. If we want to run an alarm between 10 am and 11 pm, we use the "Use Time Window" feature, like shown in the first screen.
this
Thanks for your answer. That's interresting; this case is known since 2019. So perhaps we should disable the limit_execution for now and let the main cron runs every alarms.
Regarding limit_execution, I think there is a bug because I feel that there was some inquiry in the discussion of elastalert2. https://github.com/jertel/elastalert2/discussions
Hello,
We ran into something we cannot understand. All our Praeco alarms are set up through Praeco, manually (not by yaml, so). Some alarms run pretty well while others... Just stop working all of sudden.
An example :
The data in the main view are fine.
Here's the Query Log view. You can see it stopped working the 4/6/2023 9:00:00 AM. I don't know why. And I could not make it work again, so I had to duplicate the alarm, edit it, change the limit execution by a small number (like 4 min instead of 5), save it, delete the old broken alarm, go back to the duplicated alarm and change its name back to the original one.
Query log worked... And then, since 6/21/2023 2:34:52 PM, it does not run anymore.
Praeco Yaml : `praeco_full_path: "FOTT/Services/High number error access Service TV HISENSE" praeco_query_builder: "{\"query\":{\"logicalOperator\":\"all\",\"children\":[{\"type\":\"query-builder-rule\",\"query\":{\"rule\":\"actKey\",\"selectedOperator\":\"contains\",\"selectedOperand\":\"actKey\",\"value\":\"accessServiceError\"}},{\"type\":\"query-builder-rule\",\"query\":{\"rule\":\"media\",\"selectedOperator\":\"contains\",\"selectedOperand\":\"media\",\"value\":\"tvhisense\"}}]}}" alert:
Some alarms, when doing that duplicate workaround, just don't update at all and get this Query Log tab (while the graph in the overwiew is perfectly working) :
Its overview:
Praeco Yml for that one: `praeco_full_path: "FOTT/Usage/Nb de lancement de player bas PLAYSTATION CPFRA" praeco_query_builder: "{\"query\":{\"logicalOperator\":\"all\",\"children\":[{\"type\":\"query-builder-rule\",\"query\":{\"rule\":\"actKey\",\"selectedOperator\":\"contains\",\"selectedOperand\":\"actKey\",\"value\":\"launchOnePlayer\"}},{\"type\":\"query-builder-rule\",\"query\":{\"rule\":\"media\",\"selectedOperator\":\"contains\",\"selectedOperand\":\"media\",\"value\":\"playstation\"}},{\"type\":\"query-builder-rule\",\"query\":{\"rule\":\"zone\",\"selectedOperator\":\"contains\",\"selectedOperand\":\"zone\",\"value\":\"cpfra\"}}]}}" alert:
And some other alarms, after being duplicated and the original one removed, have their Query Log tab that goes back to the original's Query Log tab, like they were not deleted at all, keeping the old frozen historic. I am wondering if the old / first alarm has really been deleted (and if not, why it does not appear in Praeco).
And that's why I have 3 questions:
1 - Any idea why this happens (except after a docker being destroyed / rebuilt, I noticed that.) 2 - Is there a way to "reconnect" all the alarms after such an incident without duplicating them (when it works)? I try enable /disable an alarm, does not work. 3 - Are there any logs about a specific alarm in the Praeco docker and / or in the Elastalert docker? If so, where?
👀 Operating environment