magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.57k stars 9.32k forks source link

Memory leak? in asynchronous message processing #38418

Open ioweb-gr opened 9 months ago

ioweb-gr commented 9 months ago

Preconditions and environment

Steps to reproduce

I'm not exactly sure how to address this, if it's a bug or not, but it stems from my initial observation of a memory leak in #34851 and #37639 In my task of creating an xml feed for 3 million products I observed #34851 and #37639 so I thought of an idea to break the operation into chunks of batches that can be processed asynchronously via queue messages.

I decided to follow this approach

Step 1. Gather all product IDs Step 2. Publish a message with let's say 3m IDs Step 3. In the handler process 1000 of them, then publish a message with (3m - 1000) product IDs until the queue is exhausted.

I thought that since it's an asynchronous operation, the handler will be invoked for every single message and terminate so the batch would be processed then terminate as php is an interpreter language so when one php process terminates it will release all memory used, thus give precise control of memory limits.

However in reality I observed this is not the case and it suffers from the same problems a foreach loop suffers.

To replicate this I created a mini module which in the handler, sets an array and variable inside the array.

replicatememoryissue.zip

Find attached the module which has:

  1. a cli command php bin/magento ioweb:replicatememorymessage which will set 100 messages in the queue
  2. a handler that will allocate some memory in an array
  3. a definition for a queue

Try to now execute

php bin/magento ioweb:replicatememorymessage

and then

php bin/magento que:co:st replicatememoryissue

You will notice that it outputs an ever increasing memory usage

``` Memory usage: 130.70127105713 MB Memory usage: 140.70547485352 MB Memory usage: 150.70964813232 MB Memory usage: 160.71382141113 MB Memory usage: 170.71799468994 MB Memory usage: 180.72216796875 MB Memory usage: 190.72634124756 MB Memory usage: 200.73051452637 MB Memory usage: 210.73499298096 MB Memory usage: 220.73916625977 MB Memory usage: 230.74333953857 MB Memory usage: 240.74751281738 MB Memory usage: 250.75168609619 MB Memory usage: 260.755859375 MB Memory usage: 270.76003265381 MB Memory usage: 280.76420593262 MB Memory usage: 290.76898956299 MB Memory usage: 300.7731628418 MB Memory usage: 310.77733612061 MB Memory usage: 320.78150939941 MB Memory usage: 330.78568267822 MB Memory usage: 340.78985595703 MB Memory usage: 350.79402923584 MB Memory usage: 360.79820251465 MB Memory usage: 370.80237579346 MB Memory usage: 380.80654907227 MB Memory usage: 390.81072235107 MB Memory usage: 400.81489562988 MB Memory usage: 410.81906890869 MB Memory usage: 420.8232421875 MB Memory usage: 430.82741546631 MB Memory usage: 440.83158874512 MB Memory usage: 450.83698272705 MB Memory usage: 460.84115600586 MB Memory usage: 470.84532928467 MB Memory usage: 480.84950256348 MB Memory usage: 490.85367584229 MB Memory usage: 500.85784912109 MB Memory usage: 510.8620223999 MB Memory usage: 520.86619567871 MB Memory usage: 530.87036895752 MB Memory usage: 540.87454223633 MB Memory usage: 550.87871551514 MB Memory usage: 560.88288879395 MB Memory usage: 570.88706207275 MB Memory usage: 580.89123535156 MB Memory usage: 590.89540863037 MB Memory usage: 600.89958190918 MB Memory usage: 610.90375518799 MB Memory usage: 620.9079284668 MB Memory usage: 630.91210174561 MB Memory usage: 640.91627502441 MB Memory usage: 650.92044830322 MB Memory usage: 660.92462158203 MB Memory usage: 670.92879486084 MB Memory usage: 680.93296813965 MB Memory usage: 690.93714141846 MB Memory usage: 700.94131469727 MB Memory usage: 710.94548797607 MB Memory usage: 720.94966125488 MB Memory usage: 730.95383453369 MB Memory usage: 740.9580078125 MB Memory usage: 750.96218109131 MB Memory usage: 760.96635437012 MB Memory usage: 770.97589874268 MB Memory usage: 780.98007202148 MB Memory usage: 790.98424530029 MB Memory usage: 800.9884185791 MB Memory usage: 810.99259185791 MB Memory usage: 820.99676513672 MB Memory usage: 831.00093841553 MB Memory usage: 841.00511169434 MB Memory usage: 851.00928497314 MB Memory usage: 861.01345825195 MB Memory usage: 871.01763153076 MB Memory usage: 881.02180480957 MB Memory usage: 891.02597808838 MB Memory usage: 901.03015136719 MB Memory usage: 911.034324646 MB Memory usage: 921.0384979248 MB Memory usage: 931.04267120361 MB Memory usage: 941.04684448242 MB Memory usage: 951.05101776123 MB Memory usage: 961.05519104004 MB Memory usage: 971.05936431885 MB Memory usage: 981.06353759766 MB Memory usage: 991.06771087646 MB Memory usage: 1001.0718841553 MB Memory usage: 1011.0760574341 MB Memory usage: 1021.0802307129 MB Memory usage: 1031.0844039917 MB Memory usage: 1041.0885772705 MB Memory usage: 1051.0927505493 MB Memory usage: 1061.0969238281 MB Memory usage: 1071.1010971069 MB Memory usage: 1081.1052703857 MB Memory usage: 1091.1094436646 MB Memory usage: 1101.1136169434 MB Memory usage: 1111.1177902222 MB Memory usage: 1121.121963501 MB ```

The problem here is that even with queues we cannot guarantee that if any part of the framework is consuming and not cleaning up memory, will release the memory when a batch is processed thus leaving us to the mercy of all memory leaks in Magento.

Expected result

Memory usage is not increasing per message as the allocation is consistent

Actual result

Memory usage keeps increasing

Additional information

No response

Release note

No response

Triage and priority

m2-assistant[bot] commented 9 months ago

Hi @ioweb-gr. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:

m2-assistant[bot] commented 9 months ago

Hi @engcom-November. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-November commented 9 months ago

Hello @ioweb-gr,

Thank you for the report and collaboration!

Verified this issue on 2.4-develop. With the provided module we were able to reproduce this issue.

Please take a look at the screenshot below:

Screenshot 2024-02-06 at 3 48 01 PM

As mentioned in the actual result, the memory keeps increasing for every iteration.

Hence confirming the issue.

Thank you.

github-jira-sync-bot commented 9 months ago

:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-10993 is successfully created for this GitHub issue.

m2-assistant[bot] commented 9 months ago

:white_check_mark: Confirmed by @engcom-November. Thank you for verifying the issue.
Issue Available: @engcom-November, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

hostep commented 9 months ago

I don't think this is a bug, because the provided module is build in such a way that it will keep increasing memory because $this->array will continue to grow and is never cleared out, it makes sense that the used memory will build up over time.

@ioweb-gr: if you purposely introduce a memory leak than that bug is in your module and not in core Magento. So you should make sure you cleanup the memory used at the end of your execute method.

But if you are worried about consumers running out of memory (due to core magento bugs or something), than you could lower the maxMessages value, so that the consumer will kill itself and start up again fresh after it has processed a certain number of messages. Which would be a workaround for a memory leak (but ideally you should try to avoid memory leaks in your consumers if possible).

ioweb-gr commented 9 months ago

@hostep In magento all collection loops and SearchResults loops for mass processing products, orders, shipments etc etc etc they all suffer from memory leaks. I've opened up at least 3 issues. For those, one recommendation was to move them to an asynchronous queue so that they would be processed in batches with a new instance.

However, the module is built in this way to indicate that handler class is initiated once and reused constantly, ever-increasing memory usage due to the inner allocations that don't get cleared. So if that handler class were to loop 1000 products for example in a catalog of 3m products, and it would process 10 messages for example, it would then still hit limits as if I was writing a simple foreach loop without a message queue.

So basically the idea that message queues would remove the excess memory usage throught their batches is invalidated by this simple module.

I could create it with looping something like in #37639 for example if that would help.

The problem in my opinion, is that message queues do not release memory after processing a message like spawning a sub-process to process the message which would terminate and release it for example.

They're just appearing as a glorified paginator running on the same loop depending on the maxMessages limit if you get my analogy.

hostep commented 9 months ago

Hmm, it kind of makes sense your remarks.

I had a bit of fun today and adapted your module in a certain way to experiment with factories and marking objects as shared or not via di.xml, here's what I came up with:

diff -ruN originalModule/Model/Queue/Handler/Handler.php newModule/Model/Queue/Handler/Handler.php
--- originalModule/Model/Queue/Handler/Handler.php    2024-02-07 19:58:09
+++ newModule/Model/Queue/Handler/Handler.php 2024-02-07 19:03:29
@@ -7,11 +7,13 @@

 class Handler
 {
-    protected $array = [];
+    public function __construct(
+        private InternalHandlerFactory $internalHandlerFactory,
+    ) { }
+
     public function execute()
     {
-        $uniqueId = uniqid();
-        $this->array[$uniqueId] = str_repeat('A', 1024 * 1024 * 10);
-        echo "Memory usage: " . memory_get_usage() / 1024 / 1024 . " MB\n";
+        $internalHandler = $this->internalHandlerFactory->create();
+        $internalHandler->execute();
     }
 }
diff -ruN originalModule/Model/Queue/Handler/InternalHandler.php newModule/Model/Queue/Handler/InternalHandler.php
--- originalModule/Model/Queue/Handler/InternalHandler.php    1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/InternalHandler.php 2024-02-07 19:56:20
@@ -0,0 +1,24 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class InternalHandler
+{
+    public function __construct(
+        private TestObject $testObject,
+    ) {}
+
+    private $array = [];
+
+    public function execute()
+    {
+        $uniqueId = uniqid();
+        $this->array[$uniqueId] = str_repeat('A', 1024 * 1024 * 10);
+
+        $memoryUsage = memory_get_usage() / 1024 / 1024;
+        $testObjectId = spl_object_id($this->testObject);
+        $testObject2Id = spl_object_id($this->testObject->testObject2);
+
+        echo sprintf("Memory usage: %f MB, Test object id: %d, Test object 2 id: %d\n", $memoryUsage, $testObjectId, $testObject2Id);
+    }
+}
diff -ruN originalModule/Model/Queue/Handler/TestObject.php newModule/Model/Queue/Handler/TestObject.php
--- originalModule/Model/Queue/Handler/TestObject.php 1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/TestObject.php  2024-02-07 19:55:31
@@ -0,0 +1,10 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class TestObject
+{
+    public function __construct(
+        public TestObject2 $testObject2,
+    ) {}
+}
diff -ruN originalModule/Model/Queue/Handler/TestObject2.php newModule/Model/Queue/Handler/TestObject2.php
--- originalModule/Model/Queue/Handler/TestObject2.php    1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/TestObject2.php 2024-02-07 19:55:46
@@ -0,0 +1,7 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class TestObject2
+{
+}
diff -ruN originalModule/etc/di.xml newModule/etc/di.xml
--- originalModule/etc/di.xml 2024-02-07 19:58:09
+++ newModule/etc/di.xml  2024-02-07 19:52:48
@@ -16,4 +16,5 @@
             </argument>
         </arguments>
     </type>
+    <type name="Ioweb\ReplicateMemoryIssue\Model\Queue\Handler\TestObject" shared="false"/>
 </config>

If you execute this, you'll notice this output:

$ bin/magento queue:consumers:start replicatememoryissue --max-messages=15
Memory usage: 80.710022 MB, Test object id: 6086, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6118, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6116, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6073, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6080, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6081, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6084, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6083, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6082, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6086, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6118, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6116, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6073, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6080, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6081, Test object 2 id: 6117
...

Notice that by leveraging a factory, the memory usage stays intact. Also notice that TestObject2 remains the same object in memory during the execution and TestObject is a new instance with every new message being processed (due to marking it as shared=false in the di.xml). This might be a bit counter intuitive, because you might think that if TestObject2 is a dependency of TestObject, it itself should also be a new instance each time, but that's not how Magento's ObjectManager works.

So yeah, I'm just afraid Magento's ObjectManager system is built in such a way that it tries to keep as much in memory for performance reasons so it doesn't need to re-initialise objects all the time. You can however change certain classes to force them to become new instances instances by using Factories or mark types as shared=false.

There is also a good write up of these systems by Alan Storm over here: https://alanastorm.com/magento_2_object_manager_instance_objects/

Not sure if this tells you anything new, but I found it interesting to do this experiment at least :)

Just be aware that different people have different needs, some will want to keep the current system, while you may want a different way it should work, we can't have one system that works for everybody, unless we have some configuration flags somehow to change the behavior.

ioweb-gr commented 9 months ago

I don't think this is a bug, because the provided module is build in such a way that it will keep increasing memory because $this->array will continue to grow and is never cleared out, it makes sense that the used memory will build up over time.

@ioweb-gr: if you purposely introduce a memory leak than that bug is in your module and not in core Magento. So you should make sure you cleanup the memory used at the end of your execute method.

But if you are worried about consumers running out of memory (due to core magento bugs or something), than you could lower the maxMessages value, so that the consumer will kill itself and start up again fresh after it has processed a certain number of messages. Which would be a workaround for a memory leak (but ideally you should try to avoid memory leaks in your consumers if possible).

First of all it's awesome you found the time to actually play around like this. I really appreciate you taking the effort and trust me I understand what you're trying to convey.

On the other hand if you take a look at the internet there are hundreds of questions about the memory consumption issues and I do think that Magento or any application really shouldn't have to consume 20gb of ram to export its catalog for example.

So I do think that if the recommended way for bulk tasks is the asynchronous handlers via message queues then it would be a great addition if there was a flag to adjust the behavior for it not to keep filling up the memory with likely unneeded objects.

I can give you other scenarios where memory will keep increasing with magento without us being able to do anything about it.

Constantly using the resourceConnection to send SQL queries which are buffered which is the default state. The zend library handling the database connection behind the scenes creates an ever increasing memory consumption.

Now we can't really fix zend or laminas in terms of that, so the framework we use should give us a way to work around it at the very least or specific tasks will never realistically be feasible. Of course the best bet would be to track all memory leaks but as you noticed yourself the issue lies in the object manager usually.

Not only object manager but repositories too will keep a cache of every object they retrieve so memory keeps getting abused at every corner in Magento.

I was hoping the queues would work around the limitations by actually forcing the instances to clean up their own mess but to do that I'd have to always set max messages at 1 and have it keep spawning and respawning wasting time and cpu cycles. At least that's how I view this at this point

ioweb-gr commented 8 months ago

@hostep

I'm trying to see how I can put this in practice in loops but without any real success.

Let's say I have a class which is responsible for fetching 100000 product ids, make them into batches of 1000. On the beggining of each batch process action it will use a factory to create a Service class instance, which can process a single item

E.g.

$productCollection = $this->collectionFactory->create();
        $ids = $productCollection->getAllIds();
        $batches = array_chunk($ids, $this->batchSize);
        $i = 0;
        foreach ($batches as $ids) {
            $singleProcessor = $this->factory->create();
            foreach ($ids as $id) {
                $singleProcessor->execute($id)
            }
            $i++;
            gc_collect_cycles();
        }

Then in the SingleProcessor class I want to use the ProductRepository

Even if I mark the repository argument as not shared, it will still increase

    <type name="Ioweb\ProductDisplay\Service\SingleProcessor" shared="false">
        <arguments>
            <argument name="productRepository" xsi:type="object" shared="false">Magento\Catalog\Api\ProductRepositoryInterface</argument>
        </arguments>
    </type>

So basically if I understand correctly all the classes injected in ProductRepository

        ProductFactory $productFactory,
        \Magento\Catalog\Controller\Adminhtml\Product\Initialization\Helper $initializationHelper,
        \Magento\Catalog\Api\Data\ProductSearchResultsInterfaceFactory $searchResultsFactory,
        \Magento\Catalog\Model\ResourceModel\Product\CollectionFactory $collectionFactory,
        \Magento\Framework\Api\SearchCriteriaBuilder $searchCriteriaBuilder,
        \Magento\Catalog\Api\ProductAttributeRepositoryInterface $attributeRepository,
        \Magento\Catalog\Model\ResourceModel\Product $resourceModel,
        \Magento\Catalog\Model\Product\Initialization\Helper\ProductLinks $linkInitializer,
        \Magento\Catalog\Model\Product\LinkTypeProvider $linkTypeProvider,
        \Magento\Store\Model\StoreManagerInterface $storeManager,
        \Magento\Framework\Api\FilterBuilder $filterBuilder,
        \Magento\Catalog\Api\ProductAttributeRepositoryInterface $metadataServiceInterface,
        \Magento\Framework\Api\ExtensibleDataObjectConverter $extensibleDataObjectConverter,
        \Magento\Catalog\Model\Product\Option\Converter $optionConverter,
        \Magento\Framework\Filesystem $fileSystem,
        ImageContentValidatorInterface $contentValidator,
        ImageContentInterfaceFactory $contentFactory,
        MimeTypeExtensionMap $mimeTypeExtensionMap,
        ImageProcessorInterface $imageProcessor,
        \Magento\Framework\Api\ExtensionAttribute\JoinProcessorInterface $extensionAttributesJoinProcessor,
        CollectionProcessorInterface $collectionProcessor = null,
        \Magento\Framework\Serialize\Serializer\Json $serializer = null,
        $cacheLimit = 1000,
        ReadExtensions $readExtensions = null,
        CategoryLinkManagementInterface $linkManagement = null

Would need to be marked as shared = false and then all internal classes for those again as shared = false

Otherwise the object manager would just reuse them all again.

Am I correct to assume this based on your examples?

Of course this is not a viable solution so what's the correct way to process giant loops of products if all internal objects are instantiated by the object manager and it's going to keep everything in memory forever?

So far I can't find the proper way to do it and anything we do ends up in memory leaks everywhere.

Sometimes asynchronous processing is not the solution.

E.g. think of a customer who needs to get an instant report of his affiliates' sales based on specific date range in the frontend. It's not suitable to tell him we'll send it later, he needs to view it as soon as possible.

Looping through each affiliate, then their orders via the repositories is bound to fail with the same memory exhaustions.

ioweb-gr commented 5 months ago

@hostep Do you think it would be possible to configure the object manager to not share instances and use factories instead within the scope of a plugin globaly for the framework? Then we could possibly modify this plugin to read a flag in CLI mode for example for specific use cases.

I haven't delved too much into how the object manager instantiates objects and caches them so any advice would be helpful.