Open ioweb-gr opened 9 months ago
Hi @ioweb-gr. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:
@magento give me 2.4-develop instance
- upcoming 2.4.x release@magento I am working on this
Join Magento Community Engineering Slack and ask your questions in #github channel. :warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting. :clock10: You can find the schedule on the Magento Community Calendar page. :telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.
Hi @engcom-November. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure. 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.Issue: Confirmed
once verification is complete. Hello @ioweb-gr,
Thank you for the report and collaboration!
Verified this issue on 2.4-develop. With the provided module we were able to reproduce this issue.
Please take a look at the screenshot below:
As mentioned in the actual result, the memory keeps increasing for every iteration.
Hence confirming the issue.
Thank you.
:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-10993 is successfully created for this GitHub issue.
:white_check_mark: Confirmed by @engcom-November. Thank you for verifying the issue.
Issue Available: @engcom-November, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.
I don't think this is a bug, because the provided module is build in such a way that it will keep increasing memory because $this->array
will continue to grow and is never cleared out, it makes sense that the used memory will build up over time.
@ioweb-gr: if you purposely introduce a memory leak than that bug is in your module and not in core Magento. So you should make sure you cleanup the memory used at the end of your execute
method.
But if you are worried about consumers running out of memory (due to core magento bugs or something), than you could lower the maxMessages
value, so that the consumer will kill itself and start up again fresh after it has processed a certain number of messages. Which would be a workaround for a memory leak (but ideally you should try to avoid memory leaks in your consumers if possible).
@hostep In magento all collection loops and SearchResults loops for mass processing products, orders, shipments etc etc etc they all suffer from memory leaks. I've opened up at least 3 issues. For those, one recommendation was to move them to an asynchronous queue so that they would be processed in batches with a new instance.
However, the module is built in this way to indicate that handler class is initiated once and reused constantly, ever-increasing memory usage due to the inner allocations that don't get cleared. So if that handler class were to loop 1000 products for example in a catalog of 3m products, and it would process 10 messages for example, it would then still hit limits as if I was writing a simple foreach loop without a message queue.
So basically the idea that message queues would remove the excess memory usage throught their batches is invalidated by this simple module.
I could create it with looping something like in #37639 for example if that would help.
The problem in my opinion, is that message queues do not release memory after processing a message like spawning a sub-process to process the message which would terminate and release it for example.
They're just appearing as a glorified paginator running on the same loop depending on the maxMessages limit if you get my analogy.
Hmm, it kind of makes sense your remarks.
I had a bit of fun today and adapted your module in a certain way to experiment with factories and marking objects as shared or not via di.xml
, here's what I came up with:
diff -ruN originalModule/Model/Queue/Handler/Handler.php newModule/Model/Queue/Handler/Handler.php
--- originalModule/Model/Queue/Handler/Handler.php 2024-02-07 19:58:09
+++ newModule/Model/Queue/Handler/Handler.php 2024-02-07 19:03:29
@@ -7,11 +7,13 @@
class Handler
{
- protected $array = [];
+ public function __construct(
+ private InternalHandlerFactory $internalHandlerFactory,
+ ) { }
+
public function execute()
{
- $uniqueId = uniqid();
- $this->array[$uniqueId] = str_repeat('A', 1024 * 1024 * 10);
- echo "Memory usage: " . memory_get_usage() / 1024 / 1024 . " MB\n";
+ $internalHandler = $this->internalHandlerFactory->create();
+ $internalHandler->execute();
}
}
diff -ruN originalModule/Model/Queue/Handler/InternalHandler.php newModule/Model/Queue/Handler/InternalHandler.php
--- originalModule/Model/Queue/Handler/InternalHandler.php 1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/InternalHandler.php 2024-02-07 19:56:20
@@ -0,0 +1,24 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class InternalHandler
+{
+ public function __construct(
+ private TestObject $testObject,
+ ) {}
+
+ private $array = [];
+
+ public function execute()
+ {
+ $uniqueId = uniqid();
+ $this->array[$uniqueId] = str_repeat('A', 1024 * 1024 * 10);
+
+ $memoryUsage = memory_get_usage() / 1024 / 1024;
+ $testObjectId = spl_object_id($this->testObject);
+ $testObject2Id = spl_object_id($this->testObject->testObject2);
+
+ echo sprintf("Memory usage: %f MB, Test object id: %d, Test object 2 id: %d\n", $memoryUsage, $testObjectId, $testObject2Id);
+ }
+}
diff -ruN originalModule/Model/Queue/Handler/TestObject.php newModule/Model/Queue/Handler/TestObject.php
--- originalModule/Model/Queue/Handler/TestObject.php 1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/TestObject.php 2024-02-07 19:55:31
@@ -0,0 +1,10 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class TestObject
+{
+ public function __construct(
+ public TestObject2 $testObject2,
+ ) {}
+}
diff -ruN originalModule/Model/Queue/Handler/TestObject2.php newModule/Model/Queue/Handler/TestObject2.php
--- originalModule/Model/Queue/Handler/TestObject2.php 1970-01-01 01:00:00
+++ newModule/Model/Queue/Handler/TestObject2.php 2024-02-07 19:55:46
@@ -0,0 +1,7 @@
+<?php
+
+namespace Ioweb\ReplicateMemoryIssue\Model\Queue\Handler;
+
+class TestObject2
+{
+}
diff -ruN originalModule/etc/di.xml newModule/etc/di.xml
--- originalModule/etc/di.xml 2024-02-07 19:58:09
+++ newModule/etc/di.xml 2024-02-07 19:52:48
@@ -16,4 +16,5 @@
</argument>
</arguments>
</type>
+ <type name="Ioweb\ReplicateMemoryIssue\Model\Queue\Handler\TestObject" shared="false"/>
</config>
If you execute this, you'll notice this output:
$ bin/magento queue:consumers:start replicatememoryissue --max-messages=15
Memory usage: 80.710022 MB, Test object id: 6086, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6118, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6116, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6073, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6080, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6081, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6084, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6083, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6082, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6086, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6118, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6116, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6073, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6080, Test object 2 id: 6117
Memory usage: 80.710022 MB, Test object id: 6081, Test object 2 id: 6117
...
Notice that by leveraging a factory, the memory usage stays intact. Also notice that TestObject2 remains the same object in memory during the execution and TestObject is a new instance with every new message being processed (due to marking it as shared=false
in the di.xml
). This might be a bit counter intuitive, because you might think that if TestObject2 is a dependency of TestObject, it itself should also be a new instance each time, but that's not how Magento's ObjectManager works.
So yeah, I'm just afraid Magento's ObjectManager system is built in such a way that it tries to keep as much in memory for performance reasons so it doesn't need to re-initialise objects all the time. You can however change certain classes to force them to become new instances instances by using Factories or mark types as shared=false
.
There is also a good write up of these systems by Alan Storm over here: https://alanastorm.com/magento_2_object_manager_instance_objects/
Not sure if this tells you anything new, but I found it interesting to do this experiment at least :)
Just be aware that different people have different needs, some will want to keep the current system, while you may want a different way it should work, we can't have one system that works for everybody, unless we have some configuration flags somehow to change the behavior.
I don't think this is a bug, because the provided module is build in such a way that it will keep increasing memory because
$this->array
will continue to grow and is never cleared out, it makes sense that the used memory will build up over time.@ioweb-gr: if you purposely introduce a memory leak than that bug is in your module and not in core Magento. So you should make sure you cleanup the memory used at the end of your
execute
method.But if you are worried about consumers running out of memory (due to core magento bugs or something), than you could lower the
maxMessages
value, so that the consumer will kill itself and start up again fresh after it has processed a certain number of messages. Which would be a workaround for a memory leak (but ideally you should try to avoid memory leaks in your consumers if possible).
First of all it's awesome you found the time to actually play around like this. I really appreciate you taking the effort and trust me I understand what you're trying to convey.
On the other hand if you take a look at the internet there are hundreds of questions about the memory consumption issues and I do think that Magento or any application really shouldn't have to consume 20gb of ram to export its catalog for example.
So I do think that if the recommended way for bulk tasks is the asynchronous handlers via message queues then it would be a great addition if there was a flag to adjust the behavior for it not to keep filling up the memory with likely unneeded objects.
I can give you other scenarios where memory will keep increasing with magento without us being able to do anything about it.
Constantly using the resourceConnection to send SQL queries which are buffered which is the default state. The zend library handling the database connection behind the scenes creates an ever increasing memory consumption.
Now we can't really fix zend or laminas in terms of that, so the framework we use should give us a way to work around it at the very least or specific tasks will never realistically be feasible. Of course the best bet would be to track all memory leaks but as you noticed yourself the issue lies in the object manager usually.
Not only object manager but repositories too will keep a cache of every object they retrieve so memory keeps getting abused at every corner in Magento.
I was hoping the queues would work around the limitations by actually forcing the instances to clean up their own mess but to do that I'd have to always set max messages at 1 and have it keep spawning and respawning wasting time and cpu cycles. At least that's how I view this at this point
@hostep
I'm trying to see how I can put this in practice in loops but without any real success.
Let's say I have a class which is responsible for fetching 100000 product ids, make them into batches of 1000. On the beggining of each batch process action it will use a factory to create a Service class instance, which can process a single item
E.g.
$productCollection = $this->collectionFactory->create();
$ids = $productCollection->getAllIds();
$batches = array_chunk($ids, $this->batchSize);
$i = 0;
foreach ($batches as $ids) {
$singleProcessor = $this->factory->create();
foreach ($ids as $id) {
$singleProcessor->execute($id)
}
$i++;
gc_collect_cycles();
}
Then in the SingleProcessor class I want to use the ProductRepository
Even if I mark the repository argument as not shared, it will still increase
<type name="Ioweb\ProductDisplay\Service\SingleProcessor" shared="false">
<arguments>
<argument name="productRepository" xsi:type="object" shared="false">Magento\Catalog\Api\ProductRepositoryInterface</argument>
</arguments>
</type>
So basically if I understand correctly all the classes injected in ProductRepository
ProductFactory $productFactory,
\Magento\Catalog\Controller\Adminhtml\Product\Initialization\Helper $initializationHelper,
\Magento\Catalog\Api\Data\ProductSearchResultsInterfaceFactory $searchResultsFactory,
\Magento\Catalog\Model\ResourceModel\Product\CollectionFactory $collectionFactory,
\Magento\Framework\Api\SearchCriteriaBuilder $searchCriteriaBuilder,
\Magento\Catalog\Api\ProductAttributeRepositoryInterface $attributeRepository,
\Magento\Catalog\Model\ResourceModel\Product $resourceModel,
\Magento\Catalog\Model\Product\Initialization\Helper\ProductLinks $linkInitializer,
\Magento\Catalog\Model\Product\LinkTypeProvider $linkTypeProvider,
\Magento\Store\Model\StoreManagerInterface $storeManager,
\Magento\Framework\Api\FilterBuilder $filterBuilder,
\Magento\Catalog\Api\ProductAttributeRepositoryInterface $metadataServiceInterface,
\Magento\Framework\Api\ExtensibleDataObjectConverter $extensibleDataObjectConverter,
\Magento\Catalog\Model\Product\Option\Converter $optionConverter,
\Magento\Framework\Filesystem $fileSystem,
ImageContentValidatorInterface $contentValidator,
ImageContentInterfaceFactory $contentFactory,
MimeTypeExtensionMap $mimeTypeExtensionMap,
ImageProcessorInterface $imageProcessor,
\Magento\Framework\Api\ExtensionAttribute\JoinProcessorInterface $extensionAttributesJoinProcessor,
CollectionProcessorInterface $collectionProcessor = null,
\Magento\Framework\Serialize\Serializer\Json $serializer = null,
$cacheLimit = 1000,
ReadExtensions $readExtensions = null,
CategoryLinkManagementInterface $linkManagement = null
Would need to be marked as shared = false
and then all internal classes for those again as shared = false
Otherwise the object manager would just reuse them all again.
Am I correct to assume this based on your examples?
Of course this is not a viable solution so what's the correct way to process giant loops of products if all internal objects are instantiated by the object manager and it's going to keep everything in memory forever?
So far I can't find the proper way to do it and anything we do ends up in memory leaks everywhere.
Sometimes asynchronous processing is not the solution.
E.g. think of a customer who needs to get an instant report of his affiliates' sales based on specific date range in the frontend. It's not suitable to tell him we'll send it later, he needs to view it as soon as possible.
Looping through each affiliate, then their orders via the repositories is bound to fail with the same memory exhaustions.
@hostep Do you think it would be possible to configure the object manager to not share instances and use factories instead within the scope of a plugin globaly for the framework? Then we could possibly modify this plugin to read a flag in CLI mode for example for specific use cases.
I haven't delved too much into how the object manager instantiates objects and caches them so any advice would be helpful.
Preconditions and environment
Steps to reproduce
I'm not exactly sure how to address this, if it's a bug or not, but it stems from my initial observation of a memory leak in #34851 and #37639 In my task of creating an xml feed for 3 million products I observed #34851 and #37639 so I thought of an idea to break the operation into chunks of batches that can be processed asynchronously via queue messages.
I decided to follow this approach
Step 1. Gather all product IDs Step 2. Publish a message with let's say 3m IDs Step 3. In the handler process 1000 of them, then publish a message with (3m - 1000) product IDs until the queue is exhausted.
I thought that since it's an asynchronous operation, the handler will be invoked for every single message and terminate so the batch would be processed then terminate as php is an interpreter language so when one php process terminates it will release all memory used, thus give precise control of memory limits.
However in reality I observed this is not the case and it suffers from the same problems a foreach loop suffers.
To replicate this I created a mini module which in the handler, sets an array and variable inside the array.
replicatememoryissue.zip
Find attached the module which has:
php bin/magento ioweb:replicatememorymessage
which will set 100 messages in the queueTry to now execute
php bin/magento ioweb:replicatememorymessage
and then
php bin/magento que:co:st replicatememoryissue
You will notice that it outputs an ever increasing memory usage
The problem here is that even with queues we cannot guarantee that if any part of the framework is consuming and not cleaning up memory, will release the memory when a batch is processed thus leaving us to the mercy of all memory leaks in Magento.
Expected result
Memory usage is not increasing per message as the allocation is consistent
Actual result
Memory usage keeps increasing
Additional information
No response
Release note
No response
Triage and priority