magemojo / m2-ce-cron

Magento 2 cron project to fix bugs and common cron issues.
https://magemojo.com
MIT License
165 stars 45 forks source link

What benefit does this module have for queue consumers exactly? #101

Closed erfanimani closed 4 years ago

erfanimani commented 4 years ago

In version 1.3 fixes are implemented for the consumers_runner cron job. This job code is a throwback from magento 1 and is more frequently used in Magento 2.3.

As far as I know, Magento 1 (even Enterprise) didn't have queueing functionality, so I'm not sure what this is referring to

It runs under its own scheduler which can execute many child jobs and bomb the system.

It doesn't really have a scheduler. It simply checks every 5 minute whether the consumer is running or not, and if it's not, it will create the consumer. The consumer process generally never stops, apart from hitting its (by default) max 10,000 messages limit. If max messages is set to 0, the consumer process never stops. If set-up properly, this shouldn't cause any issues and "bombing the system".

For Magento 2.3.2 and below, Magento uses a PID file to check for running processes. For newer versions, MySQL locking functions are used (so a new deployment that wipes out the PID file doesn't cause duplicate consumer processes).

In this version of the module this parent job is intercepted and written as individual jobs in the cron_schedule table and then run in a sane manner from there.

👍

These consumer jobs can also go into infinate loops, so a timeout is imposed on them by default of 30 seconds. This setting can be adjusted in the admin.

An infinite loop sounds like a bug somewhere else in the code. If infinite loop is referring to the time that a consumer is alive, again, consumer jobs are supposed to run forever, irrespective or cron. It's cron's responsibility to ensure the processes exist.

If cron checks for the existence of the process every 5 minutes, and there is a default timeout of 30 seconds, that means that there is a 4:30 minute window that the consumer processes aren't running. If web-hooks, emails or other important tasks are using a queue approach, that's a maximum of 4:30 minute delay.

On MageMojo Stratus, when setting max timeout to 4 minutes (240 seconds), I'm getting hundreds of emails + support tickets with subject "Runaway Processes on Stratus Instance" every single day, so it looks like this "new" approach doesn't even check if the consumer is already running or not, it just runs it regardless every 5 minutes. And when the occasional overlap does occur, Stratus kills off the process.

On top of this, when the consumer process reaches its max lifetime or overlaps, it is abruptly killed, possibly stopping whatever job it was processing at the time — which could cause state issues in the application or missed jobs.

It sounds like the philosophy behind queue consumers wasn't really understood. In certain ways, this module makes Magento worse. It went backwards to cron based processing instead of consumer based processing.

gnuzealot commented 4 years ago

The code that executes those processes is repurposed M1 code. It was repurposed for queuing, but the main problem is that it executes in essentially it's own cron scheduler because of this. It fires up a ton of jobs and forgets about them. Then they can lock and run forever. The consumers processes are and have always been cron processes, they are designed to run and complete, but sometime do not. There are many complaints about this functionality. The pid or the mysql tracking has also never worked properly, in M1 or M2, another common complaint which the module solves. There are fundamental problems with the consumer processes which is what this module is trying to mitigate, so certainly there are problems elsewhere in the code. The module is not trying to fix them, it's trying it's best to manage a broken system. We have seen these processes frequently use massive cpu power running infinite loops.

The module does not wait 5 minutes to check if jobs are queued and need to be run, it's every 20 seconds, so it is faster than the standard cron in that regard provided there is a job queued to run for these. This approach does definitely check if processes are already running, I did check the tickets you received and the processes were not overrunning, they were all different consumers jobs. The issue was that the setting for the number of concurrent jobs that could be run was changed and the system saw there were too many jobs running and threw the alert.

erfanimani commented 4 years ago

Thanks for the clarification. I'll close it, but maybe it would be a good suggestion to add some links to related magento/magento2 bug reports/issues.