Closed gwharton closed 6 years ago
@gwharton, thank you for your report. We were not able to reproduce this issue by following the steps you provided. If you'd like to update it, please reopen the issue. We tested the issue on 2.2.0
@gwharton I have been having the same problem since upgrading to 2.2.0. Have you done anything outside of clearing the cron_schedule table to fix it? I have done the same, but old entries are still not deleted. The timezone is set correctly, but that doesn't seem related.
All of my installations are working as they should. The size of my cron_schedule tables are fairly static at around 1200 rows. This seems to be the correct behaviour, as the Magento settings regarding keeping cron history say to keep 1 hour, and looking at the entries in the table that is correct. The behaviour I saw in 2.1 where the cron_schedule table was only a 100 or so rows seemed to be incorrect behaviour, as it was only keeping the last minute. I suspect this was some sort of timezone problem on the timestamps.
I did have a couple of occasions where one of the cron jobs appeared to start, but had no finish time. This resulted in the table growing in size out of control as new rows for that cron job were added in the pending state, but never got run because the "stuck" job was still running as far as Magento was concerned. Clearing the table seemed to set things going again.
I have no idea why the job was stuck. Feels like Magento should have some method of detecting stuck jobs, or jobs that completed but didn't update their completed timestamp. Without this, Magento never recovers, resulting in the ramp up of CPU usage, eventually hitting 100% in a couple of weeks time as it repeatedly parses through the massive and growing amount of pending jobs every minute.
Unfortunately I haven't been able to reproduce this recently.
One thought that I did have, is that I was doing a lot of development work on a module at the time these problems occurred. During some of these changes/recompiles/updates I would get emails telling me that the cron process failed, or when I restarted mysql that the database connection failed. Perhaps this is when the cron job starts but doesn't finish and then from this point on it just creates the pending jobs forever.
Since this, I now updated my cron jobs to check if Magento is in maintenance mode
* * * * * ! test -e ~/public_html/var/.maintenance.flag && php ~/public_html/bin/magento cron:run .........
This way, it makes sure that the cron jobs are never run while you are in maintenance mode. I don't know if this is recommended or not, but it feels much cleaner that cron jobs aren't run when I am carrying out magento upgrades/recompiles/development.
I have a standard shell script which I run every time I make changes or want to upgrade the store. It puts the store into maintenance mode while it updates magento to latest version, clears all static content, upgrades/recompiles/deploys static content and then clears the caches. Since religiously using this script to enforce maintenance mode, and having the cron jobs disabled while in maintenance mode I have not had any further issues.
#!/bin/sh
php bin/magento maintenance:enable
composer update
rm -rf pub/static/*
rm -rf var/view_preprocessed
rm -rf generated/code
rm -rf var/cache
rm -rf generated/metadata
rm -rf var/page_cache
php bin/magento setup:upgrade
php bin/magento setup:di:compile
php bin/magento setup:static-content:deploy --area=frontend --theme=Gw/frontend --language=en_US
php bin/magento setup:static-content:deploy --area=adminhtml --theme=Gw/backend --language=en_US
php bin/magento cache:clean
php bin/magento cache:enable
php bin/magento maintenance:disable
When updating modules, once I am happy they work on my test store, I put the production store into maintenance mode, upload the necessary updates, then run the above script to finish the deployment.
Unfortunately the cron_schedule table kept growing even after a redeployment in maintenance mode, like you described. To fix my problem, I just wrote a cronjob that cleans old entries in cron_schedule every hour.
Same problem here. @dwirt, how did you change the cronjob?
My dev store went again at about 2am on the 17th. See phpmyadmin attached. The first entry in the table is marked as running. Then there are 6000 odd entries following it, all in the pending state and growing.
@magento-engcom-team I am unable to reproduce this on demand, but it feels like Magento is missing some cleanup code that detects cron jobs that are stuck in the running state. Really it should detect a stuck job, clear it and log some sort of error, allowing the future pending jobs to clear naturally. Just repeatedly adding pending jobs infinitum seems like a poor decision.
I deleted the first row of the table, that was in the running state, and on the next cron run, it cleared all the pending tasks for this stuck cron, however rising to the top of the table, 3 more stuck cron jobs the following day at midday.
These DO correspond to when I was performing maintenance on the store, where there may have been code installed containing errors, or the database dropped etc or other anomoly going on.
@cytracon I didn't change anything. I just added a new cronjob to the crontab.
0 * * * *
mysql magento-db -e "delete from cron_schedule where scheduled_at < date_sub(now(), interval 1 hour)"
Of course you could also write a magento cronjob, but I don't trust those anymore.
(Edit: see gwhartons post below)
Not wanting to stray off topic but @dwirt If it's a shared server then beware of putting your mysql password in the command line like that. It can be viewed by anyone while the process is running.
The preferred method is to create a ~/.my.cnf
file and specify mysql magento-db .....
on the command line. The host, username and password will be read from the file automatically.
The contents of .my.cnf should be
[mysql]
host = <hostname>
user = <username>
password = <password>
You can then secure this file with chmod 600
to keep it safe. You can also add the equivalent section for [mysqldump]
if you use that command in cron jobs for database backups etc.
@gwharton Thanks!
@magento-engcom-team Can this issue be re-opened. I, and several others by the looks of it, are having to go into the database on a regular basis to clear out cron jobs that are stuck in the running state.
It seems to be worse on development installs, where cron jobs may be crashing whilst being run and never being marked as completed. Its not the same job every time. Seems to be random which one gets stuck.
If you don't manually delete the stuck running job, then, on my webserver, the CPU is overwhelmed at 100% within about 3 days as the never ending list of pending jobs in the cron_schedule table increases. If you don't spot the problem, the first you will know about it, is when your webserver is unresponsive as MySQL is overwhelmed by Magento cycling through thousands of pending cron jobs every minute.
@gwharton: yes exactly, I've seen this behavior a couple of times on our projects, a development server was running at 100% CPU for over 2 weeks until someone noticed it was because of a cronjob or indexer which got stuck somehow.
This should really get fixed!
I've been battling some negative behavior with cron:run lately and I think this thread is describing the root cause of my issue. I started up a new virtual machine with Magento and it was working fine at first. But after a week or 2 the server was super slow and unresponsive to the point of not being usable. I eventually noticed there would 10 or more instances of the cron:run process running simultaneously and hammering MySQL. I had the process set to run every minute, as seems to be the default. I walked back the cron schedule to every 8 minutes and that prevented mutliple cron:run processes from running simultaneously.
I finally found this issue and it lead to counting the rows in cron_schedule, which was 208,046! I ran the query posted above and that brought it down to 252 rows.
My site has just been under light developement, no traffic.
After running the query
delete from cron_schedule where scheduled_at < date_sub(now(), interval 1 hour)
now I've switched all 3 standard magento cron process - cron:run, setup:cron:run, cron.php - back to running every minute, and all 3 run pretty much instantly in under 5 - 10 seconds or less.
I'm new to magento, so I can't speak to the cause but i can say anyone that runs will be left with an unusable site, unless they heavily upgrade their hardware. I posted this question to magento.stackexchange.com and another user said they were experiencing the same issue, check out the comments. https://magento.stackexchange.com/questions/201063/should-2-of-the-standard-cron-always-be-running?noredirect=1#comment278625_201063
UPDATE: After adding this query to the crontab after reading about this, it fixed all negative behavior and after switching my 3 magento crons back to running every minute, the cron_schedule table is holding steady at around 1030 - 1050 rows with the delete query deleting about 20 rows every 15 minutes when it runs.
I have the same problem. Magento 2.2.1.
@magento-engcom-team
I have created a test module that implements a cron job that crashes during execution.
It can be downloaded from here Gw_CronCrash.zip
This is a simple cron job that just throws an invalid exception, instead of actually doing anything useful. It simulates a cron job crashing for whatever reason during execution.
Once installed, on the next 1 minute boundary, the exception is logged in var/log/magento.cron.log (if you have setup logging of cron output).
Now if you check your cron_schedule table
SELECT * FROM 'cron_schedule' WHERE 'job_code' = 'Gw_CronCrash'
You will see the first row shows the first time it tried to run the job. The job will be in the "running" state, even though the job crashed and failed a long time ago.
Every 15 minutes, a new batch of 15 jobs in the pending state are added to the cron table with status "pending". No rows are ever removed.
The table will grow forever.
Even if you fix the problem with the code in the module, the cron job for that module is NEVER run again. 1440 rows are added to the cron table every day. Every minute, every row of the cron_schedule table is parsed by Magento. Depending on the capabilities of your machine, your CPU could be maxed out in as little as a week.
The only way out is to manually delete the first entry in the cron_schedule table to remove the "running" job. Magento, then does a nice job of cleaning up the remainder of the "pending" entries, as you would expect.
Could this issue be reopened. It is 100% reproducable with this example module.
@gwharton Your hard work on this is appreciated. I have raised this to the comeng team for reopening, and am investigating this solution myself now.
There is some work happening in this PR https://github.com/magento/magento2/pull/12497 which aims to prevent the same cron group being run concurrently. It doesn't sound like it would fix this issue on the cleaning up of the crashed cron task but it might help in alleviate the issue of piled up cron jobs.
Hello. I have the same issue on Magento 2.2.1. Here is how many cronjobs there are in pending status from 2017-11-22:
SELECT * FROM `ver_cron_schedule` WHERE `status`='pending'
Showing rows 0 - 24 (146437 total, Query took 0.1034 seconds.) [created_at: 2017-11-22 16:56:05... - 2017-11-22 16:56:05...]
@gwharton, thank you for your report. We've created internal ticket(s) MAGETWO-83782 to track progress on the issue.
Not terribly useful but just wanted to add that I ran into this problem on Magento 2.2.1, my count for cron_schedule had grown to 532983. Clearing the schedule solved the problem, so thanks to those who found the reason. I just hope someone can find a fix for the cause of the growing table.
Implementing the stub scheduled cleanup of this table has dropped the execution timeout low enough that jobs are no longer timing out within a 15 minute period.
Read: this is me
#Ansible: Workaround while upstream bug #11002 is being resolved
*/5 * * * * systemd-cat --identifier=magento_cron_cleanup -- flock --timeout="900" /run/lock/magento_cron_cleanup.lock /usr/local/bin/n98-magerun2.phar --root-dir="/var/www/html" db:query 'DELETE FROM cron_schedule WHERE scheduled_at < DATE_SUB(NOW(), INTERVAL 1 HOUR)'; echo "magento_cron_cleanup_exit_code $?" > /var/metrics/prometheus/magento_cron_cleanup.prom
#Ansible: Execute the Magento scheduled job runner
* * * * * systemd-cat --identifier=magento_system_cron -- flock --timeout="900" /run/lock/magento_system_cron.lock /usr/bin/php /var/www/html/bin/magento cron:run; echo "magento_system_cron_exit_code $?" > /var/metrics/prometheus/magento_system_cron.prom
same here!
Hello there, I've got the same problem o Magento 2.2.0 when creating one custom cron job. I've also have Magento 2.2.1 installed on different server without any custom cron jobs and cron is working fine, cron_schedule table contains something around 1k records where on 2.2.0 containing one custom cron job have 150k records.
I can see a bunch of PHP processes running, it's killing my 8CPUs server and MySQL:
Each one is eating 0.5GB of memory.
i can totally relate - i have the same issue as described in this thread - any update??
I can confirm I get this issue on 2.2.2 as well!
I can also confirm this on M2.2.2CE (open source).
select count(*) as 'jobs' from cron_schedule where status='pending';
result: 32567
records
This is on an AWS c4.xlarge (4 vCPUs and 7.5GB RAM) and it's impacting performance quite a bit.
I ran with @andrewhowdencom 's approach, but implemented one without any 3rd party dependencies (such as n98-magerun or prometheus)
I created a directory in my magento installation's var directory called cron_cleanup
to group everything together ... I also am using a standalone php script to interact with the db. Depending on someone's situation, this might not be acceptable, but it works for me. Also, the actual log that the cron entries create might not server any real purpose as-is for most people, so change them to do what you would like or remove them all together.
#Workaround while upstream bug #11002 is being resolved
*/5 * * * * systemd-cat --identifier=magento_cron_cleanup -- flock --timeout="900" /run/lock/magento_cron_cleanup.lock /usr/bin/php /var/www/magento/var/cron_cleanup/execute.php; echo "magento_cron_cleanup_exit_code $?" > /var/www/magento/var/cron_cleanup/issue-11002.cleanup.log
#Execute the Magento scheduled job runner
* * * * * systemd-cat --identifier=magento_system_cron -- flock --timeout="900" /run/lock/magento_system_cron.lock /usr/bin/php /var/www/magento/bin/magento cron:run; echo "magento_system_cron_exit_code $?" > /var/www/magento/var/cron_cleanup/issue-11002.magentosystemcron.log
My execute.php
file is as such:
<?php
require '/var/www/magento/app/bootstrap.php';
$bootstrap = \Magento\Framework\App\Bootstrap::create(BP, $_SERVER);
$objectManager = $bootstrap->getObjectManager();
$resource = $objectManager->get('Magento\Framework\App\ResourceConnection');
$connection = $resource->getConnection();
$tableName = $resource->getTableName('cron_schedule');
$sql = "DELETE FROM {$tableName} WHERE scheduled_at < DATE_SUB(NOW(), INTERVAL 1 HOUR);";
$connection->query($sql);
*make sure the execute.php
has execution permissions chmod +x /var/www/magento/cron_cleanup/execute.php
This approach is working for me so far...I'm on CE (Open Source), but on EE (Commerce) there's also the issue of consumers to take into consideration ... so if someone's installation isn't using RabbitMQ, then the Mysql Queue will still be pounded with consumers. Again, all the implementation details here and approach is fine for me, but maybe not for others..YMMV and change it up to work for you.
Hopefully this helps someone, and thanks @andrewhowdencom and @skukla .
2.2.1 also it happens.
SELECT
main_table.* FROM
cron_scheduleAS
main_tableWHERE (
status= 'pending')
INSERT INTO
cron_schedule(
job_code,
status,
created_at,
scheduled_at) VALUES ('indexer_update_all_views', 'pending', '2018-01-09 10:17:06', '2018-01-09 10:20:00')
UPDATE
cron_scheduleAS
current LEFT JOIN
cron_scheduleAS
existingON existing.job_code = current.job_code AND existing.status = 'running' SET
current.
status= 'running' WHERE (current.schedule_id = '1119') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)
UPDATE
cron_scheduleSET
job_code= 'my_custom_job_cide',
status= 'pending',
messages= NULL,
created_at= '2018-01-08 09:42:03',
scheduled_at= '2018-01-08 09:44:00',
executed_at= NULL,
finished_at= NULL WHERE (schedule_id='1127')
Repeat every minute, we have pretty snowball effect now ;)
I can confirm the same issue with M2.2.2CE We have heavy magento cron job that works 15 minutes. With this bug, we make a big load on the database. So now we use a separate php file that configured directly in Linux crontab :)
The way to reproduce this bug:
$this->getNonExistingObject()->doSomething();
try { .. } catch { ... }
So as we can see there are main two faults for this situation happen:
@magento-team This is a pathetic bug that is wreaking havoc on our cron. Can you at least respond to this?
@wildcard27 for tempory solution you can use
Ok so we've analyzed this issue and here is what we found:
The problem appear since you have entry in cron_schedule which has status = running and it will never be finished. Usual reason to that might be
Then in Magento\Cron\Model\ResourceModel\Schedule we have method trySetJobUniqueStatusAtomic which will permanently return false for this particular job_code. That will results in all new cron schedules with the same job_code to be not executed/managed. They won't be run, they won't be marked as missed and number of these schedules will permantently grow which lead cron:run to work slower and slower.
Solution which i see is that old schedules with status "running" should be marked as "error". I think this lifetime could be configured in admin config.
@magento-team Let me know what do you think about this solution so we can start to think about Pull Request.
I had an issue where mysql/php service was burning 100% CPU, as it seemed that crons were backlogging. Within hours of rebooting the server or restarting the services, everything would come to a stand-still again.
It turned out that a recent composer update had overwritten my .htaccess (something that never occurred to me), thus reducing my memory_limit from 2G to the default value of 756M. Within 2-4 minutes of updating the .htaccess file and clearing out the cron table, my CPU calmed right the way down. Been fine for a couple of weeks now.
Same issue here. Magento 2.2.2 is killing my server. 100% CPU load -> 8 Cores 2.6
Yes indeed, we are also still seeing this happening in Magento 2.2.2, cronjobs still can end up in the state running
forever, which then in turn keeps adding new pending
jobs forever and you end up with > 30.000 jobs in the database, until you manually delete them from the table in the database.
@dmanners: I've noticed you have a backport open to fix this in Magento 2.1, but I think you should hold of with that until this gets properly fixed in Magento 2.2, I have the feeling (not scientifically proved) that the state of crons in Magento 2.1.11 is actually better then in 2.2.2 at this point. Maybe PR https://github.com/magento/magento2/pull/12497 will fix it for good, but I haven't tested it yet.
Cron's are still really a mess and should be fixed rather sooner than later, we have to go into servers a lot lately to fix the state of cronjobs. Even on the Magento Cloud infrastructure we keep running into issues with this.
Same issue here! 150000+ rows in table and growing. Killing the server.
Magento | 2.2.2 Apache | 2.4.29 PHP | 7.0.27 MySQL | 5.6.39-cll-lve
Is this error/bug fixed in 2.2.3?
@jontesamuelsson No this isn't fixed in 2.2.3.
@gwharton Thanks for quick answer. Let's hope the root problem gets fixed soon to :-)
Quick tip for this is described in here: https://alekseon.com/en/blog/post/magento-2-slow-and-cpu-usage-gets-high-this-might-be-the-reason/
And fix itself is here: https://github.com/Alekseon/CleanRunningJobs
Hey @Linek, nice one on the plugin however it's still a core issue with Magento and should be addressed by the developers - this issue has been stagnant for an extremely long time and it's a critical bug.
I'm sure this could be re-worked into a pull request, if it ever gets approved/looked at...
Hi @Jean-PierreGassin I absolutely agree that it should be fixed in the core however our fix is only workaround. I don’t think that deleting some “running” flow if it’s older than x days is perfect solution. And if it is, then what should be an x ...
seem i might be getting this issue too in 2.2.2
n98-magerun2 db:query "SELECT * FROM mags_cron_schedule WHERE status='pending';" --skip-root-check | wc -l
428
@centminmod I think you might be ok there actually. Magento schedules cron jobs in batches, so the pending jobs you are seeing there are probably the scheduled jobs for the next 20 minutes or so. This "scheduling ahead" behaviour can be configured in
Stores -> Configuration -> Advanced -> System -> Cron configuration options for group : xxxx -> Schedule Ahead for
and
Stores -> Configuration -> Advanced -> System -> Cron configuration options for group : xxxx -> Generate Schedules every
The default group settings are to schedule the next 20 minutes cron jobs, every 15 minutes. You should find if operating correctly the table should fluctuate in size with the number of pending jobs being somewhere between about 70 and 500 ish. You should see somewhere between 5 and 20 pending items for each cron job depending on when the query is run.
We wrote an extension to fix these bugs, speed up performance, and control the execution of tasks: https://github.com/magemojo/m2-ce-cron
A part of this issue is going to be fixed by #12805 and it's being worked as we speak. Feel free to test the patch and comment on the PR. Thanks for the help @ericvhileman, is there any plan to move the fixes to the core instead of a separated module?
@miguelbalparda it could be folded into the core, but the extension replaces the cron functionality entirely so would probably need to have the core team approve the undertaking before trying
there's also a few linux specific points of new functionality that would need to be addressed so the 5 or so people running m2 on windows could still have functioning crons
@akellberg comment worked for me
After running the query
delete from cron_schedule where scheduled_at < date_sub(now(), interval 1 hour)
This could also be cause
www>html>var>log>update.log
PHP Memory_limit
[2018-03-20 21:40:02] setup-cron.ERROR: Your current PHP memory limit is 128M. Magento 2 requires it to be set to 756M or more. As a user with root privileges, edit your php.ini file to increase memory_limit.
I ran into this issue on a Magento 2.2.2 Commerce site, even though none of my jobs were getting left in a running
state.
history_success_lifetime
value of 10080
(7 days). With these two CRONs running every minute, if you have ~70 CRON jobs, you can easily end up with hundreds of thousands of success
jobs in the DB.use_separate_process
= 1 will all run in parallel. This is not normally a problem—however if the cron_schedule
table has a massive number of records and each of those CRON jobs tries to delete the individual outdated cron_schedule
entries, you start running into lock wait timeouts. SHOW PROCESSLIST
in MySQL, we would see that multiple processes were trying to delete the same record from the DB at the same time:
We would also see these errors during this time:
[Zend_Db_Statement_Exception]
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction, query was: DELETE FROM `cron_schedule` WHERE (schedule_id='6594032')
To workaround this issue, we deceased the number of entries in cron_schedule
by going into STORES > Configuration > ADVANCED > System and decreasing the "Success History Lifetime" value from "10080" to "1440". Hopefully the #12497 PR will resolve this issue.
Preconditions
Steps to reproduce
Expected result
Actual result
The cronjob steadily increases in the time taken to complete, at the moment it is taking around 30 seconds to complete, during which time, mysql and php are taking up heavy CPU usage.
A MYSQL query log shows magento churning through all the pending requests, but they are never marked as success. Hence the ever increasing list of jobs to run.
Snippet from the Mysql Query log below