Open PerRommetveit opened 6 years ago
I think I found a solution to the issue above. The ever growing executions table which holds data about minicron job executions slows down minicron significantly.
The following command makes it snappy again. Adjust your paths if these are not the same as mine.
/usr/bin/sqlite3 /opt/minicron/lib/vendor/ruby/2.2.0/gems/minicron-0.9.7.1480251919/db/minicron.sqlite3 "DELETE FROM executions;" ".exit"
It might be an idea to make this into a configuration parameter, ie. preserve_executionhistory=7days
or similar.
How many executions did you have out of interest?
I'm currently working in general on improving performance for v1.0. I'm hopeful there is a way other than deleting data to improve this :)
Thanks for the info btw!
You are most welcome James, and thanks for creating good software, looking forward to v1.0!
I am not sure exactly how many entries there were in the executions table. But I had about 15 jobs, and some of them running every 3 minutes, so naturally that's 480 entries just for one job in a day. Multiply that up, and it becomes a significant number of rows.
The symptoms where that ocasionally jobs would not execute, and I received error messages to my slack chan as described in the OP. Also, the WEB UI became much slower over time. May I suggest that when viewing a job, that not all Executions be loaded, but perhaps only the first 50, and then the user can select the next 50 and so on. Loading all at once, is quite slow when there's a lot of rows involved.
Also not sure why the 'failed to execute at its expected time' errors happened, but I suspect that the part of the code which tries to run a job, have a timeout in regards to some db read, and when it can't get the data it needs, then it skips the execution. That's not the desired behaviour, so if it can be fixed, that would be great.
To reproduce the issue, you could set up a test instance, then install minicron using sqlite database, and have a number of jobs (ie. 20) run every minute, also set up notifications to a slack channel, and after a while you will see jobs starting to fail to execute, and the webui to be less responsive. I would think increasing the number of jobs significantly would trigger these issues sooner.
@PerRommetveit we have the same issue - I scheduled a job in minicron to clean up the executions and alerts tables in the minicron database:
#!/bin/bash
USER="root"
PASSWORD=""
DB="minicron"
mysql -u$USER -p$PASSWORD $DB -e "DELETE FROM executions WHERE created_at < DATE_ADD(CURDATE(), INTERVAL -3 DAY);"
mysql -u$USER -p$PASSWORD $DB -e "DELETE FROM alerts WHERE sent_at < DATE_ADD(CURDATE(), INTERVAL -3 DAY);"
Version: minicron 0.9.7 OS: Ubuntu 16.04.4 LTS
It's a reoccurring issue that jobs fails to execute at their expected time. I've got 10 cronjobs set up with minicron, and every day I get messages like this to my slack channel:
Job #8 failed to execute at its expected time - 2018-04-23 09:29:30 +0000 Job #11 failed to execute at its expected time - 2018-04-23 09:29:30 +0000 Job #16 failed to execute at its expected time - 2018-04-23 09:29:30 +0000
When looking at the Executions history for a particular job, the job which was alerted as "failed to execute" does not have an entry in the Executions history for that particular point in time. Executions before and after is fine though, and most executions for a job runs just fine. I just checked for a particular job, and it failed on a random day 2 times, that's a failure rate of 0.0041%. Minicron simply does not run it, and does not log why, not even when verbose and debug flags are enabled at runtime to the minicron server.
Any idea what the issue can be here? Let me know if you need more information.