magento / community-features

Magento Features Development is an Initiative to Allows Community Memebers Join to Development of Magento Features
46 stars 18 forks source link

Message queue consumer improvements to allow them to waste less resources #180

Open hostep opened 4 years ago

hostep commented 4 years ago

Hi folks

Since Magento 2.3.2 got released, we are now seeing at least 4 consumer processes running on the servers for every shop, for example:

php bin/magento queue:consumers:start product_action_attribute.website.update --pid-file-path=product_action_attribute.website.update-hostname.pid --max-messages=10000
php bin/magento queue:consumers:start product_action_attribute.update --pid-file-path=product_action_attribute.update-hostname.pid --max-messages=10000
php bin/magento queue:consumers:start codegeneratorProcessor --pid-file-path=codegeneratorProcessor-hostname.pid --max-messages=10000
php bin/magento queue:consumers:start exportProcessor --pid-file-path=exportProcessor-hostname.pid --max-messages=10000

According to the release notes of 2.3.2, these are meant to handle potentially long running tasks triggered in the backend of Magento like:

I totally understand the idea and it's a good idea. But... We usually work with shops which are very small, contain little amounts of products and store owners not updating their catalog of products very often.

So in these cases of smaller shops, we still have these processes running, taking up resources (both cpu and memory) but not doing anything. If a store owner only uses the mass update of products like once a year, we have a process running for 364 days basically doing nothing except for wasting resources. Which isn't really great or efficient.

I would therefore like to suggest some ideas which might help with this:

It would be great if these new configuration options could be exposed in the adminhtml. I would also like to see this --max-messages being configurable in the adminhtml per consumer type, which currently doesn't seem to be the case.

What are people's thoughts on this, does this request make sense?

nuzil commented 4 years ago

1) --max-idle-time option may kill the process but process will be started again with next cron, so it will not really solve your issue 2) with "Having an option per consumser type (I think that's called a topic?) where we can configure it to only start when at least one message is in the queue." Here we rely on architecture, cause in case of RabbitMq we have to detect when new message is appeared in the queue, which mean on each cron we will make a request to the RabbitMq queue to detect the messages, which basically not anymore a consumer, it will look like a usual cronjob.

hostep commented 4 years ago

Thanks for the feedback @nuzil!

  1. I saw it as those two new options being used simultaneous to resolve the problem, only the --max-idle-time wouldn't resolve it by itself indeed.

  2. I'm not familiar yet with how RabbitMq works exactly, already tried to get it working with these new asynchronous features from Magento 2.3.2 but failed, no messages from these features seemed to arrive in RabbitMq, maybe I did something wrong during testing or maybe it means that these features at this time can't work with RabbitMq out of the box. Also: are the Magento consumer procceses needed when using RabbitMq, that isn't really clear to me yet?

Do you have some other suggestions to work around these Magento consumer processes always-running-and-not-doing-anything problem?

nuzil commented 4 years ago

Yep, I would recommend just disable jobs you dont need. Add this into your env.php

'cron_consumers_runner' => [
        'cron_run' => true,
        'max_messages' => 20000,
        'consumers' => [
            'async.operations.all',
            'product_action_attribute.website.update',
        ]
    ]

in consumers define which one you need only OR set cron_run to false

If you want to have them work but they are not landed to the queue, pls check if you have connection to RabbitMq in your env.php file and if not add it and them make at least setup:upgrate to create all queues in Rabbit

Also: are the Magento consumer procceses needed when using RabbitMq, that isn't really clear to me yet?

They needed in any case if you want to use Queue functionality, no matter if its DB or RabbitMQ

Do you have some other suggestions to work around these Magento consumer processes always-running-and-not-doing-anything problem?

Currently unfortunately not, you have to kill them from processes what I guess you are already doing

hostep commented 4 years ago

Yep, I would recommend just disable jobs you dont need.

We can't just disable those consumers if there is a chance those async features will be used someday.

Suppose we disable them, then after a few months a client is calling us complaining he wants to use the export functionality in Magento and it's not working. Then we have to enable them again, wait until the client is done and then disable them again. That's not really a workable solution here I think? Should be a bit more automated/smarter.

If you want to have them work but they are not landed to the queue, pls check if you have connection to RabbitMq in your env.php file and if not add it and them make at least setup:upgrate to create all queues in Rabbit

Is RabbitMQ a drop-in-replacement for DB queues? I was under the impression you could say in some xml file which one should be used (fe: queue xml files in catalog module, everything seems to be hardcoded to use db here?). I had RabbitMQ setup correctly as far as I was aware, but messages still ended up in the DB. Anyway, will do some more testing when I find some more time. Where is the most appropriate place to report bugs against this (should I encounter any)? Just the normal m2 repo, the async-import repo, or someplace else?

nuzil commented 4 years ago

1) yes, in this case if you really need them, they have to stay enabled, and this mean for now at least you have to deal with "kill" processes if you want to restart them

2) To use DB or RabbitMq is defined in queue_topology.xml file for each queue. Unfortunately to switch queue between connection still require XML modifications in your own module. Currently I cannot tell why its made in this way, cause I more a fan of correct fallback system. But if queue is defined as DB connection enabled, it will be accessable via DB and another way

hostep commented 4 years ago

Ok thanks for the further explanation!

  1. I think you are more talking about https://github.com/magento/community-features/issues/181 here, let's not confuse these two feature requests, they are completely different :)

  2. I'm coming back to your remark from earlier:

    Here we rely on architecture, cause in case of RabbitMq we have to detect when new message is appeared in the queue, which mean on each cron we will make a request to the RabbitMq queue to detect the messages, which basically not anymore a consumer, it will look like a usual cronjob.

So to avoid having to call RabbitMQ to see if there are new messages, we can only do this for DB queues, since we can probably figure out if a consumer uses RabbitMq or the DB queueing system. So would it then make sense for those two new options I'm suggesting to only work on consumers which use the DB queueing system and not when RabbitMQ is being used? Not sure if I'm a fan of this though, that makes the logic between RabbitMQ and DB queueing even bigger.

Unfortunately to switch queue between connection still require XML modifications in your own module. Currently I cannot tell why its made in this way, cause I more a fan of correct fallback system.

Yeah, this is a bit surprising, weird that it got implemented this way.


So, what I'm searching for is some kind of solution to not waste precious resources of a server when no messages appear in the queue for weeks or months. And my initial post had some suggestions, but it might not be the best option here.

Anybody with some other ideas?

FreekVandeursen commented 4 years ago

Another part of the solution could be to make sure the continuous running process doesn't take up so much resources. After all: if a script has only a small resource footprint, it isn't so bad if it is running all the time. Would it be possible to create a small running script, without loading all of Magento's overhead, to check for new messages, and only launch the full Magento stuff as soon as new messages appear?

hostep commented 4 years ago

@nuzil: thought about it for a bit more, since these new options I was suggesting would be optional and configurable, do you still have objections against them? And we could also set some sensible defaults per consumer type?

@FreekVandeursen: thanks for thinking along, that might be an option what you are suggesting. But then there should also still need to be a feature that stops the "full Magento stuff" after a while.


Also: I just accidentally bumped into the consumers from the symfony/messenger component, and noticed they also have some interesting options per consumer which Magento could maybe consider: https://github.com/symfony/messenger/blob/3d65f22f9a56f6475c19999fdbc3a897cefc8900/Command/ConsumeMessagesCommand.php#L77-L80