magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.55k stars 9.32k forks source link

[2.2.0-*] cron_schedule forever increasing in size. Lots of pending jobs never cleared #11002

Closed gwharton closed 6 years ago

gwharton commented 7 years ago

Preconditions

  1. Magento 2.2.0-rc30 running on Ubunti 16.04
  2. Deployed initially from zip, but updated to 2.2.0-rc30 using composer

Steps to reproduce

  1. Nothing, just look at the cron_schedule table

Expected result

  1. On 2.1.9 my cron_schedule table is around 180 items in size. Its size is pretty much static. A snapshot shows the vast majority of jobs are in the success state, with a couple of pending jobs about to be run in the next minute or so.

Actual result

  1. On 2.2.0-rc30 which has been running for around 8 days (upgraded from previous rc) the size of the cron_schedule table is around 6500 items in size. The size is constantly increasing every minute. The majority of the jobs are in the pending state. Some are marked as success.

The cronjob steadily increases in the time taken to complete, at the moment it is taking around 30 seconds to complete, during which time, mysql and php are taking up heavy CPU usage.

A MYSQL query log shows magento churning through all the pending requests, but they are never marked as success. Hence the ever increasing list of jobs to run.

Snippet from the Mysql Query log below

90 Query    START TRANSACTION
90 Query    UPDATE "cron_schedule" SET "job_code" = "catalog_product_outdated_price_values_cleanup", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:48:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189337")
90 Query    COMMIT
90 Query    UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189338") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query    START TRANSACTION
90 Query    UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:33:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189338")
90 Query    COMMIT
90 Query    UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189339") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query    START TRANSACTION
90 Query    UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:34:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189339")
90 Query    COMMIT
90 Query    UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189340") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query    START TRANSACTION
90 Query    UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:35:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189340")
90 Query    COMMIT
90 Query    UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189341") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query    START TRANSACTION
90 Query    UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:36:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189341")
90 Query    COMMIT
90 Query    UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189342") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
gwharton commented 6 years ago

image Spot the two occurrences of this bug this week. Managed to catch both in time before any damage done. I have the MYSQL DELETE fix setup as a cron job for my live and production stores, but this was a temporary store I installed for some destructive development which didn't have the MYSQL DELETE cron job set up.

Tofandel commented 6 years ago

Referenced here https://github.com/magento/magento2/issues/15190 and here https://github.com/magento/magento2/issues/4978

image

Any ETA on the update, is it gonna be in version 2.2.6 ?

That snowball effect is crashing a lot of servers worldwide ;)

Workaround to minimize the impact : Run the cron with "flock -w 1 cron.lock php bin/magento cron:run" this cron should be run every minute (it's very important because the heartbeat is scheduled every minute in magento like this the table will not be flooded exponentially

ajankuv commented 6 years ago

@Tofandel These people just don't care.

kandy commented 6 years ago

@Tofandel @ajankuv I believe that most of cron related issues were fixed in 2.2.5 (except default lifetime for records). Can you confirm that you try to reproduce it on this version?

Tofandel commented 6 years ago

No I'm on version 2.2.3, I was preparing an update to 2.2.5 but I didn't see the cron fix in any changelog @kandy I'll see after that update

kandy commented 6 years ago

It was done by community in https://github.com/magento/magento2/pull/12497 PR Also you can apply patches for 2.2.3 from this comment https://github.com/magento/magento2/pull/12497#issuecomment-401392609

gwharton commented 6 years ago

I can confirm that in 2.2.5, certainly in my situation, that cron jobs that crash during execution are now marked as "error" in cron_schedule instead of sitting in the running state. This means that the cron cleanup code now properly deals with these jobs and successive jobs run as planned. Initial testing shows the race condition that allowed the cron_schedule table to grow infinitely in size is now fixed. Closing this issue as fixed. Phew!

robfico commented 6 years ago

Will there be a patch / fix for 2.1.x?

AgentGod commented 6 years ago

After updating from 2.2.3 to 2.2.5 I have the same Cron issue. The table is growing and I don't know how to proceed.

gwharton commented 6 years ago

Yeah, since 2.2.5 ive had it twice so it looks like it isnt fixed. Look for the entry in cron-schedule table that is in the running state and delete that row. Magento will then clean up for you.

AgentGod commented 6 years ago

My bad, the table have little bit more that 10 000, but I get error

[Magento\Framework\DB\Adapter\DeadlockException]

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction, query was: DELETE FROM cron_schedule WHERE (status = 'missed') AND (job_code in ('currency_rates_update', 'backend_clean_cache', 'visitor_clean', 'catalog_index_refresh_price', 'catalog_product_flat_indexer_store_cleanup', 'catalog_product_outdated_price_values_cleanup', 'catalog_product_frontend_actions_flush', 'catalog_product_attribute_value_synchronize', 'catalogrule_apply_all', 'system_backup', 'security_clean_admin_expired_sessions', 'security_clean_password_reset_request_event_records', 'sales_clean_quotes', 'sales_clean_orders', 'aggregate_sales_report_order_data', 'aggregate_sales_report_invoiced_data', 'aggregate_sales_report_refunded_data', 'aggregate_sales_report_bestsellers_data', 'sales_grid_order_async_insert', 'sales_grid_order_invoice_async_insert', 'sales_grid_order_shipment_async_insert', 'sales_grid_order_creditmemo_async_insert', 'sales_sendorder emails', 'sales_send_order_invoice_emails', 'sales_send_order_shipment_emails', 'sales_send_order_creditmemo_emails', 'paypal_fetch_settlement_reports', 'outdated_authentication_failures_cleanup', 'expired_tokens_cleanup', 'newsletter_send_all', 'analytics_subscribe', 'analytics_update', 'analytics_collect_data', 'magento_newrelicreporting_cron', 'catalog_product_alert', 'aggregate_sales_report_coupons_data', 'persistent_clear_expired', 'captcha_delete_old_attempts', 'captcha_delete_expired_images', 'aggregate_sales_report_shipment_data', 'sitemap_generate', 'get_amazon_capture_updates', 'get_amazon_authorization_updates', 'amazon_payments_process_queued_refunds', 'aggregate_sales_report_tax_data', 'weltpixel_backend_moduleinfoupdate', 'yosto_arp_rule_apply_all')) AND (created_at < '2018-07-22 12:20:02')

What this means? I didn't have such issues before updating to 2.2.5

AgentGod commented 6 years ago

So, I had a lot of pending jobs, deleted manually all of them and everything seems to be ok for day or two, but I started to receive the error like above again. When I have checked, I had a lot of pending entries again at the end of the table, manually deleted them, for now it's ok

gnuzealot commented 6 years ago

There was another blocking query running at the same time it was doing a delete and it locked the table. Most likely it was the query that it uses to update job statuses. That query always runs long and if there's a several running it deadlocks.

Also is there any good reason why all the job codes need to be called out in that delete?

rickschippers commented 6 years ago

I see the same behaviour as @AgentGod. Multiple sites start creating deadlock errors after updating to 2.2.5. When I clear the cron_schedule it seems fine for a few days, then it starts again.

nlwstein commented 6 years ago

I believe I have seen this issue happen with an exception being thrown by dotmailer/dotmailer-magento2-extension. We have used composer patches to prevent the known exception codepath from ever being triggered, but this seems like an issue that will likely cause issues on prod servers (as it did for us, unfortunately) so just trying to bump the priority here.

mpropl commented 6 years ago

Where I can find a patch ?

rickschippers commented 6 years ago

@mpropl You can also try the MageMojo extension. I have used on a couple of installations since recently. It seems to fix these issues as well: https://github.com/magemojo/m2-ce-cron

mpropl commented 6 years ago

MageMojo work for me for a while, but now it hangs and does not perform cron tasks :(

kandy commented 6 years ago

MageMojo use file locks, so it will not work on the multinode installation

ericvhileman commented 6 years ago

@kandy you shouldn't be running the cron on multiple nodes. Cron should be scheduled on one node and there is not an issue with file locks.

ericvhileman commented 6 years ago

@mpropl we're going to need more information about your issue if you would like us to help: https://github.com/magemojo/m2-ce-cron/issues/51

kandy commented 6 years ago

@ericvhileman I does not think that use cron service on only one node is high-available solution. Also, this extension requires to run on unix-like OS and have to access to /proc/ filesystem.

ericvhileman commented 6 years ago

@kandy dude... the cron docs LITERALLY say not to do this: "In a multi-node system, crontab can run on only one node. This applies to you only if you set up more than one webnode for reasons related to performance or scalability.". I can also think of many reasons why it's a terrible idea.

Source: https://devdocs.magento.com/guides/v2.2/config-guide/cli/config-cli-subcommands-cron.html

miguelbalparda commented 6 years ago

+1 to what @ericvhileman said about running crons only in one node.

Ctucker9233 commented 6 years ago

@magento-engcom-team Please reopen. This is not resolved.

kandy commented 6 years ago

@Ctucker9233, Can I ask you to create new issues with concrete steps to reproduce? This ticket has many different issues described in the comments, so it hard to understand what concrete issues do you ask to resolve.

acurvers commented 6 years ago

@magento-engcom-team Please reopen. This is not resolved. reproduced in 2.2.6

SELECT count() FROM cron_schedule WHERE status='pending'; +----------+ | count() | +----------+ | 11888 | +----------+

johnny-longneck commented 6 years ago

I can confirm this. Also 11000+ records in the table. One day after I truncated the table. Also 2.2.6

kandy commented 6 years ago

@acurvers @johnny-longneck Can you verify the settings for cron history lifetime is set to 60min, not to 1 week? It configured in SYSTEM CONFIGURATION > ADVANCED > System > Cron (Scheduled Tasks)

johnny-longneck commented 6 years ago

I had overwritten previous history lifetime to 740 as recommended somewhere. I switched back to "use global settings" which is now 60. This might seem to have resolved the table overload.

Cron Table now constant at ca. 1000 entries.

Ctucker9233 commented 5 years ago

@kandy I did but it was closed as a duplicate

paras89 commented 5 years ago

@acurvers @johnny-longneck Can you verify the settings for cron history lifetime is set to 60min, not to 1 week? It configured in SYSTEM CONFIGURATION > ADVANCED > System > Cron (Scheduled Tasks)

@kandy Which setting is this? Failure history lifetime?

toweringmedia commented 5 years ago

I have been stuck on this same issue for several weeks. I think I finally got it figured out with the help of this post. I made a quick blog post about it so I can refer to it in the future. https://toweringmedia.com/blog/magento-2-cron-job-pending-jobs-never-cleared-solved/

vincentteyssier commented 4 years ago

Same issue in 2.3.3. After 24 hours on a clean install, the cron table is 4080 lines with 410 jobs pending

markojak commented 4 years ago

Why is this closed when it's an on-going issue?

gwharton commented 4 years ago

@vincentteyssier Sounds normal behaviour to me. Default behaviour is every 15 minutes, magento will schedule in the next 20 minutes of cron jobs into the cron_schedule table. These will be in the pending state. Take a look at the times of your pending jobs. Are they all scheduled to run in the next 15 minutes or so. Once run they will change to success/failed status and will eventually be cleared. All of the lifetime timings of cron jobs in the cron_schedule table can be configured in admin config. With a fairly standard setup, you should see the cron_schedule table hover between 1000-4000 entries with a ratio of pending/successful jobs of between 25 and 75% depending on where in the 20 minute cycle you are.

@markojak If you believe this is still happening in 2.3-develop, then please create a new issue.

vincentteyssier commented 4 years ago

@gwharton you're right, after 24 more hours it is stable between 4k-4.1k and approx 300 pending

hewgim commented 4 years ago

Is this issue actually fixed. I've just had two separate development servers use up over 1 million CPU seconds in a few hours as a result of the magento cron, on a pretty much vanilla installation of Magento 2.3.3 with 30 products and zero activity apart from the cron running

One server had 10000 rows in the cron schedule table and the other had just over 1000 (after I had changed the save history lifetime setting). I'm not sure I'm seeing the same issue that other people see with hundreds of thousands of jobs stacked up in the cron_schedule table so I'm not sure if I need to raise a separate ticket.

In the error logs the high cpu usage seemed to coincide with the database server rebooting and a high memory usage when creating a temporary table.

Looking through this thread it seems to me that the cron module does a huge amount of unnecessary work and it also seems to lack necessary indexes to make its queries fast. For example the clean up of missed jobs looks at the created_at column but this is not indexed so the query has to table scan. You can see how the cron grinds to a halt once the table gets large. Doesn't look like this is my problem though.

hewgim commented 4 years ago

On further investigation it appears that my problem of excessive cpu usage is due to a bug in magento queue consumer logic in 2.3 that means they never complete. see https://github.com/magento/magento2/issues/17951

Zyles commented 4 years ago

Probably not fixed. M2 devs loves to close issues because they are "task oriented" and have to fill their daily quota of finished tasks (without actually fixing anything).

However this problem is still present and we are having increase CPU usage over time with cronjobs and mysql slow queries.

gwharton commented 4 years ago

I've not seen this specific bug in a long time on my production install. Currently on 2.3.3.

Please raise a new issue with full steps to reproduce if problem still persists for you.

Zyles commented 4 years ago

So am I.

https://github.com/magento/magento2/issues/26507

LiamKarlMitchell commented 4 years ago

Still a problem at least in 2.3.0

zzvara commented 4 years ago

Still a problem in 2.3.4:

MariaDB [db]> SELECT count(*) FROM cron_schedule WHERE status='pending';
+----------+
| count(*) |
+----------+
|      261 |
+----------+
1 row in set (0.007 sec)
fooman commented 4 years ago

@zzvara see above comment https://github.com/magento/magento2/issues/11002#issuecomment-556889494 261 is within normal range - I'd suggest monitoring this over time to see if this increases.

zzvara commented 4 years ago

I'm not sure. Magento is already using 12/24 cores constantly. I'll post more data for you to tackle down the problem. We are using Kubernetes and HelmReleases, it should be fairly easy to reproduce.

zzvara commented 4 years ago

Not much is running on that Kubelet node, but Magento is using 12/24 cores.

image

The site itself has become incredibly slow as well. These CRON jobs are running all the time. They are all on default values of 2.3.4.

We are using the following deployment: https://github.com/bitnami/charts/tree/master/bitnami/magento

The following Dockerfile that contains all plugins:

FROM bitnami/magento:2.3.4-debian-10-r12

USER root

RUN apt update
RUN apt install wget -y
RUN apt install rsync -y
RUN apt install vim -y

ENV COMPOSER_MEMORY_LIMIT -1

# This is the home directory of Magento.
WORKDIR /opt/bitnami/magento/htdocs

ADD auth.json .

RUN chown bitnami:daemon auth.json || exit 1

USER bitnami:daemon

RUN composer require mageplaza/module-core
RUN composer require mageplaza/module-smtp
RUN composer require mageplaza/module-gdpr
RUN composer require mageplaza/magento-2-social-login
RUN composer require dhl/module-rates-express
RUN composer require mageplaza/magento-2-hungarian-language-pack:dev-master

COPY extensions/geo-ip-redirect-1.3.0-CE/app/code/Amasty /opt/bitnami/magento/htdocs/app/code/Amasty

USER root

RUN chown -R bitnami:daemon app
RUN chown -R bitnami:daemon pub

RUN find /opt/bitnami/magento/htdocs -type d -print0 | xargs -0 chmod 775 && find /opt/bitnami/magento/htdocs -type f -print0 | xargs -0 chmod 664 && chown -R bitnami:daemon /opt/bitnami/magento/htdocs

Dockerfile is frequently rebuilt and the container therefore restarted.

Here is the entry point:

echo "Adding Magento core cron entries."
ln -sf /opt/bitnami/magento/conf/cron /etc/cron.d/magento

/usr/sbin/cron

echo "Running upgrade on Magento."
sudo -u bitnami php -d memory_limit=4096M /opt/bitnami/magento/htdocs/bin/magento setup:upgrade -vvv || true

echo "Running compile on Magento."
sudo -u bitnami php -d memory_limit=4096M /opt/bitnami/magento/htdocs/bin/magento setup:di:compile -vvv || true

echo "Running deployment on Magento sources."
sudo -u bitnami php -d memory_limit=4096M /opt/bitnami/magento/htdocs/bin/magento setup:static-content:deploy -f -vvv || true

echo "Install and start Cron."
sudo -u bitnami php bin/magento cron:install || true

echo "Maybe adding custom cron entries."
if [ -f "/opt/bitnami/magento/htdocs/custom-cron-entries" ]; then
  echo "Found custom cron entries, so adding it to 'crontab -e'."
  echo "Custom cron entries are: "
  cat /opt/bitnami/magento/htdocs/custom-cron-entries
  runuser -l bitnami -c '(crontab -l 2>/dev/null; cat /opt/bitnami/magento/htdocs/custom-cron-entries) | crontab -'
fi

echo "Starting Apache..."
exec httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
root@magento-788c7d54f-kxkx4:/opt/bitnami/magento/htdocs# cat /etc/cron.d/magento
* * * * * bitnami /opt/bitnami/php/bin/php /opt/bitnami/magento/htdocs/bin/magento cron:run

* * * * * bitnami /opt/bitnami/php/bin/php /opt/bitnami/magento/htdocs/update/cron.php

* * * * * bitnami /opt/bitnami/php/bin/php /opt/bitnami/magento/htdocs/bin/magento setup:cron:run
root@magento-788c7d54f-kxkx4:/opt/bitnami/magento/htdocs# cat /opt/bitnami/magento/htdocs/custom-cron-entries

0 1 * * * redacted_copy_backups_to_remote_server.sh
kandy commented 1 year ago

@marthasimons7777 Can I suggest you upgrade to supported version of Commerce where this issue is resolved?