Closed gamort closed 10 months ago
Hi @gamort. Thank you for your report. To help us process this issue please make sure that you provided the following information:
Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:
@magento give me 2.4-develop instance
- upcoming 2.4.x release
For more details, please, review the Magento Contributor Assistant documentation.
Please, add a comment to assign the issue: @magento I am working on this
Join Magento Community Engineering Slack and ask your questions in #github channel.
:warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.
:clock10: You can find the schedule on the Magento Community Calendar page.
:telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, please join the Community Contributions Triage session to discuss the appropriate ticket.
:movie_camera: You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel
:pencil2: Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 14 days if no further activity occurs. Thank you for your contributions.
Issue makes sense I think, removing stale issue label...
@sidolov: would be great if somebody could verify if this is still the case in 2.4-develop (since the referenced code was refactored a couple of months ago) and then assign priorities.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 14 days if no further activity occurs. Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? Thank you for your contributions!
Sigh ...
@sidolov, @ihor-sviziev : can we please delete this stalebot crappy thing? What's the use of people opening issues if nobody looks at them and then they get auto closed? That's just depressing and might scare people away from opening more issues in the future ...
(not saying that this issue is valid, but it feels like it should at least be investigated instead of auto closed)
@sidolov @sivaschenko @gabrieldagama this should be p1 or p2 priority
BTW it looks like the right solution will be not in the time of adding the record to DB, but on the indexation phase to process all duplicated records at once
@hostep , @ihor-sviziev we will take a look at it and prioritize in case it's still reproducible
Hi @engcom-Foxtrot. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
[ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).Details
If the issue has a valid description, the label Issue: Format is valid
will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid
appears.
[ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description
label to the issue by yourself.
[ ] 3. Add Component: XXXXX
label(s) to the ticket, indicating the components it may be related to.
[ ] 4. Verify that the issue is reproducible on 2.4-develop
branchDetails
- Add the comment @magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
[ ] 5. Add label Issue: Confirmed
once verification is complete.
[ ] 6. Make sure that automatic system confirms that report has been added to the backlog.
Hello, @gamort. I've checked this issue on the 2.4-develop branch and seems like it's already fixed. Could you please check it on the latest codebase?
I think we have same problem with 2.4.2, keep getting lock in different user cases
An error occurred during creating the deployment package, error message: Unable to execute command "php7.4 /tmp/builds/d-3e55edc1/bin/magento setup:upgrade 2>&1" - Running schema recurring...SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction, query was: UPDATE cron_schedule
SET status
= ?, messages
= ? WHERE (status = 'running')
And maybe it is why this extension made ? https://github.com/magemojo/m2-ce-cron
Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
[ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).Details
If the issue has a valid description, the label Issue: Format is valid
will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid
appears.
[ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description
label to the issue by yourself.
[ ] 3. Add Component: XXXXX
label(s) to the ticket, indicating the components it may be related to.
[ ] 4. Verify that the issue is reproducible on 2.4-develop
branchDetails
- Add the comment @magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
[ ] 5. Add label Issue: Confirmed
once verification is complete.
[ ] 6. Make sure that automatic system confirms that report has been added to the backlog.
Hello @gamort,
Please try to reproduce this issue in the 2.4-develop branch and let us know if you are still able to reproduce the issue.
Thanks
Hello @gamort,
We have noticed that this issue has not been updated for a period of 14 Days. Hence we assume that this issue is fixed now, so we are closing it. Please raise a fresh ticket or reopen this ticket if you need more assistance on this.
Regards
I believe this issue is still valid, could you still try to reproduce this @engcom-Hotel?
What you should look for (if I understand the issue correctly) is for duplicated entity_id
's in the changelog tables (like catalogsearch_fulltext_cl
or catalogrule_product_cl
or catalog_product_attribute_cl
, ...) and then figure out if when the partial reindexing system runs, if the same entity_id
is being processed once or more then once.
The expectation is that the same entity_id
is only processed once in a single partial reindex run.
Thanks!
@hostep,
We might have some potential deadlocks by replacing insert
with replace
.
Aren't we fetching the unique values before doing the reindex?
https://github.com/magento/magento2/blob/e4a2d3cdd2a1d427a35009c89b52fe570aba2847/lib/internal/Magento/Framework/Mview/View/Changelog.php#L214-L227
and
https://github.com/magento/magento2/blob/e4a2d3cdd2a1d427a35009c89b52fe570aba2847/lib/internal/Magento/Framework/Mview/View/ChangeLogBatchWalker.php#L44-L57
Yeah indeed, I would indeed handle making them unique when reading and not when writing.
But it appears this already happens like you indicate due to the distinct
call, so maybe the issue is no longer reproducible at this time?
@hostep, yeah, that's my point - it shouldn't reproduce.
Hello @hostep and @ihor-sviziev
I'll share my final conclusion after some test.
I migrated from 2.3.4 to 2.4.3-p1 recently. After a full product and stock import, I have some very long scheduled index and even some are stuck forever. It creates 800k line with only 30k distinct for "catalogrule_product_cl" as an Example.
(I have 0 rule in place, but the indexer takes more than 3 days, never finished... this is another issue, a foreach product than a foreach rule)
But for me, by reading the source code,
There is an issue when the mview is being updated if the same entity_id is present in two different chunk of the batch size. On 2.3.4, they first retrieve the max 100k versions, make them unique and batch on them
3.4.3-p1
Franck.
Hi @fgarnier-hublot, Could you report it as a separate issue? Thank you!
Hi, sorry to respond on a closed issue, but I think this is still relevant and I didn't found any other related issue.
The problem clearly comes from the ChangeLogBatchWalker::walk() function.
It misses a limit offset so it actually loops multiple times on the same values. Here is a simple example:
Fill your changelog table with 10000 rows of only 3 distinct entity_ids (1, 2, 3) Default batching is 1000.
walk() function will perform a SELECT DISTINCT on rows 0 to 10000 with LIMIT 1000
it will return: 1, 2, 3
on second call, it will perform a SELECT DISTINCT on rows 1000 to 10000 with LIMIT 1000
which also returns 1, 2, 3 ...
same for 8 more useless calls!
the easy fix would be to add an offset argument to the walk function so, on second call, we could perform a SELECT DISTINCT on rows 0 to 10000 with LIMIT 1000 OFFSET 1000 which will return, in our example, no result, and end the walk loops
Hope it helps, Regards
@ihor-sviziev: can this be re-opened and re-validated? Now that there is more info.
Hello @hostep,
Let me reopen this for further investigation.
Thanks
Having had this crushing our relatively small store (few thousand products, a dozen customer groups, a dozen catalog price rules) by hanging up the process for indexer_update_all_views for hours: I'm trying a different approach.
Rather than building out a massive batch walker update to deal with multiple version_ids per entity_id and get the max for each one, I'm updating the generic table definitions and triggers so that:
Yes, there's additional overhead, but it's orders of magnitude less than churning the indexers constantly. This still required manual intervention listed below to deploy, but until the index is added the code continues to function similarly to Magento's current implementation.
Having had this crushing our relatively small store (few thousand products, a dozen customer groups, a dozen catalog price rules) by hanging up the process for indexer_update_all_views for hours: I'm trying a different approach.
Rather than building out a massive batch walker update to deal with multiple version_ids per entity_id and get the max for each one, I'm updating the generic table definitions and triggers so that:
- The entity_id is inserted if it isn't present already
- If it already exists the version_id is updated to the newest auto_increment value since all we really care about is that the "latest" version_id in the changelog table for an entity_id is greater than the pointer for that table stored in mview_state.
Yes, there's additional overhead, but it's orders of magnitude less than churning the indexers constantly. This still required manual intervention listed below to deploy, but until the index is added the code continues to function similarly to Magento's current implementation.
- Setting indexers to "update on save"
- Flushing cache
- Deleting the existing _cl tables manually
- Reset the mview_state.version_id to 0 for all indexers
- Setting the indexers back to "update by schedule" to recreate the _cl tables
- Invalidating existing indexes and reindexing to ensure all changes are captured
As the REPLACE statement is slower than INSERT, I think that approach could impact triggers performances and slow down large operations.
Also, what about replacement occurring while mview indexation is running ? Some rows that would have been processed during this cron process, would be delayed to next cron, right ?
Hello @gamort @hostep @bigbangx,
Thanks for your input on this issue. I have some findings and would like to share them here. In order to try to reproduce the issue, I have followed the below steps:
24-MB01
20 times via API rest/all/V1/products/24-MB01
catalog_product_price_cl
DEFAULT_BATCH_SIZE
in file lib/internal/Magento/Framework/Mview/View.php
with 10 from 1000.bin/magento dev:query-log:enable
command.catalog_product_price_cl
. Please find attached the screenshot of db.log:The query always returns entity_id 1
each time.
In this case, it seems that entity_id 1 will process many times, instead of only 1 time, and should be on the latest one.
Thanks
As the REPLACE statement is slower than INSERT, I think that approach could impact triggers performances and slow down large operations.
Also, what about replacement occurring while mview indexation is running ? Some rows that would have been processed during this cron process, would be delayed to next cron, right ?
@bigbangx You are correct that it does impact trigger performance, but we weighed the impact to doing a catalog-wide update as a couple of seconds increased processing time for our store. Impact to saving product changes and other real-time actions like stock updates was negligible. We had far more problems with the backlog reprocessing changes on the same entities than we are having with the version_id being replaced.
Regarding the version_id updating during indexing: the entity_id has already been selected at that point, so it's going to be indexed. The max version_id pointer for that batch should also already be saved, so what should happen is:
This is essentially the same process as now, with the difference that if you have multiple iterations of the same entities being changed by multiple bulk actions, you won't get the oldest version ids for processing and work through the same set repeatedly.
The extreme edge case would be if you had so many products being updated that your indexer never caught up to the version_id of a particular entity because there were always more entities with older version_ids filling up your batch size. If you have that problem, it doesn't matter how this portion is coded: your catalog churn is faster than Magento's current indexer model can keep up with.
With all that being said: there are other issues that should be addressed with the current architecture like the sheer number of changelog entries that can be triggered: auto_increment int fields do run out of digits eventually. This was an expeditious band-aid that could be patched in to keep things running while an optimal long-term solution is explored.
We are confirming this issue as per our findings here: https://github.com/magento/magento2/issues/30012#issuecomment-1310391317
Also updated the main description to reproduce the issue.
Thanks
:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-7009 is successfully created for this GitHub issue.
:white_check_mark: Confirmed by @engcom-Hotel. Thank you for verifying the issue.
Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.
I don't think this was specifically covered in the initial report, but on a related note we have discovered a major functional problem with the Magento partial indexer (indexer_update_all_views) starting with Magento 2.4.3 including the latest version (2.4.6)
The indexer processes records from the changelog in batches. It gets the IDs to process here:
https://github.com/magento/magento2/blob/2.4.6/lib/internal/Magento/Framework/Mview/View.php#L308
This will generate a query like this
SELECT
DISTINCT $changeLogTable.entity_id
FROM changeLogTable
WHERE (version_id > $fromVersionId) AND (version_id <= $toVersionId)
GROUP BY `entity_id` LIMIT $batchSize
This will SELECT the 1000 rows greater than the $fromVersionId all the way to $toVersionId. As a result of the GROUP BY they will also be ordered by the entity_id
Notably $toVersionId is not $fromVersionId + $batchSize. It's the maximum version ID from the changelog table at the time when processing of the backlog begins (fetched here: https://github.com/magento/magento2/blob/2.4.6/lib/internal/Magento/Framework/Mview/View.php#L255)
So let's say you have a large backlog greater than fromVersionId and have entity_ids starting 1,4,7 etc...
It will process those, even if they are not in the range of $fromVersionId + $batchSize
Next it will increment fromVersionId by batchSize and then run through the loop again
https://github.com/magento/magento2/blob/2.4.6/lib/internal/Magento/Framework/Mview/View.php#L313
This means with a large backlog it is not guaranteed that the entities from the previous batch are actually the ones that get processed.
I'm observing this on a project right now with a very large backlog and what it's doing is just processing the same set of entity_ids over and over for each batch, which are the lowest ones and aren't even the ones that are in the batch it's supposed to be processing. Essentially none of the data is indexed correctly by the partial indexer.
Hi @kandy. Thank you for working on this issue. Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure. 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.Hi @vzabaznov. Thank you for working on this issue. Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure. 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.Hi @ddzhura. Thank you for working on this issue. Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure. 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.Hi @chernenm. Thank you for working on this issue. Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure. 2.4-develop
branch, please, add the label Reproduced on 2.4.x
.@mpchadwick, solid analysis as always, thanks! By chance, does this also mean that the partial indexing never finishes or at least runs very long if it processes the same IDs over and over again?
@sprankhub - It goes through the backlog based on the batch size. It should take around the same amount of time regardless of the issue I outlined above as the backlog size would be the same. It's just that with the issue I described in each batch it can process the wrong entities, and then move forward to the next batch and the entities that should have been processed from previous batch never get processed.
An explanation from a colleague which maybe helps clarify if there is confusion
Since it is rather hidden in this issue, I would like to mention again that there is a promising pull request already, which might fix the issue.
I wonder if the answer here is to throw away the auto_increment
column from the change-log tables (*_cl
), and only store the ID of the entity that's changed (if needed, with a UNIQUE
constraint and use insert or ignore into ...
in the trigger to avoid duplicates); on the indexer side we can delete values when they're processed. The "how many in change-log table" statistic can then be select count(*) from table_cl
instead of tracking the last-seen ID and searching for any new values. This also solves the problem of running out of auto_increment
numbers in the change-log table for a busy catalogue (which doesn't seem to be part of this conversation yet).
We've build a module to address this issue some years ago: https://github.com/ho-nl/magento2-ReachDigital_BetterIndexer
And a few other fixes I don't specifically remember.
Confirmed over here as well. Seeing this problem on a shop we recently upgraded from Magento 2.3.5 to 2.4.5-p2
After some quick local testing, the module https://github.com/ho-nl/magento2-ReachDigital_BetterIndexer seems to improve it a lot (with a backlog of 3861 items in the catalog_product_attribute
indexer, default Magento 2.4.5 loops 203 times over the collection using a batch size of 1000, which is insane. The ReachDigital_BetterIndexer only needs 5 loops, which makes a lot more sense).
Haven't tested out the mentioned PR yet, will try to do that as well.
I've tested the PR as well and it seems to be even more efficient than the ReachDigital_BetterIndexer module.
With the same dataset, the PR only takes 4 loops and fills up the loops with 1000 items always (except for the last loop which just contains the remainder). The ReachDigital_BetterIndexer module however didn't fill up every loop with 1000 items, only the first and third loop contained 1000 items in my case, the others were ~650, ~750 and ~450 items big.
So both are good solutions at first sight, we'll probably try out the PR on a real project in production soon-ish, as the ReachDigital_BetterIndexer contains more than just this bugfix and I'm a bit hesitant of adding more changes then only a fix for this problem without investigating them in more detail.
Thanks to all the people who commented here with your findings and proposed solutions, it really helped. Let's hope Adobe will also start taking this issue a bit more seriously, because it's a really big problem in my opinion that causes a lot of unnecessary resource usage on servers for bigger Magento projects when using the partial reindexers.
Update: some actual numbers, after the PR was implemented, the partial reindexing system took 14 minutes to run fully. Without the PR, it was running for more than 2 hours, before I gave up and the indexer:status
still showed that nothing had changed, so without the PR this would have taken a very very long time to finish.
The ReachDigital_BetterIndexer module however didn't fill up every loop with 1000 items, only the first and third loop contained 1000 items in my case, the others were ~650, ~750 and ~450 items big.
Not sure if there is any perf difference measurable the batch size is choosen arbitrarily at 1000, so no guaranteers that more will be faster.
as the ReachDigital_BetterIndexer contains more than just this bugfix
Yeah it contains another important step, it can be that memory becomes an issue when running very large batches or faulty indexes.
By default it will only save the indexer staat after everything is indexed: https://github.com/magento/magento2/blob/9544fb243d5848a497d4ea7b88e08609376ac39e/lib/internal/Magento/Framework/Mview/View.php#L309
In de indexer it will save after every batch, which makes recovering a lot easier. https://github.com/ho-nl/magento2-ReachDigital_BetterIndexer/blob/master/src/Plugin/ViewPlugin.php#L167-L169
Also on crash it will not leave the indexer in a processing state: https://github.com/ho-nl/magento2-ReachDigital_BetterIndexer/blob/master/src/Plugin/ViewPlugin.php#L197
Also on crash it will not leave the indexer in a processing state: https://github.com/ho-nl/magento2-ReachDigital_BetterIndexer/blob/master/src/Plugin/ViewPlugin.php#L197
Just FYI in case you weren't aware of this as this is a well hidden feature in Magento, but this can also be fixed by adding the following in your app/etc/env.php
file since Magento 2.4.3 (it's a completely different approach but should more or less give the same end result):
'indexer' => [
'use_application_lock' => true,
]
Documentation: https://developer.adobe.com/commerce/php/development/components/indexing/#using-application-lock-mode-for-reindex-processes
Found a new quality patch that Adobe released yesterday, which is most likely also related to this issue: https://github.com/magento/quality-patches/compare/6ad18382c1...763cd71220#diff-e7a95f179c6968989ee14e35fa76700d6674f0fb45b9aac96b844df61f7c14f9
Description:
ACSD-51114 : Fixes the issue where a random product disappears from large catalogs when asynchronous indexing is enabled. Improves performance of asynchronous reindexing for large catalogs.
Also another potential fix was mentioned in https://github.com/magento/magento2/issues/37367
Haven't tried any of these yet, but they sound promising as well...
And one more related issue just reported https://github.com/magento/magento2/issues/37398. Looks like it should be fixed all together
I've prepared PR implementing order by clause in select query just yesterday https://github.com/magento/magento2/pull/37399. It's funny that official patch ACSD-50814 was prepared also yesterday, since that bug is oldie one.
I've pointed in #37398 that implementing order by won't fix it completely. When there is more than one change resulting in inserting to _cl table, there will be that many records with this id. This is a situation where MySQL query with distinct feature is useless, because of the usage of limit in it. The best approach to fix it IMHO is to implement batching with PHP, make values unique and then chunk and reindex it.
Also in many indexers there are implemented batching features, so maybe we could completely remove batching on view update action, and just move responsibility to indexers itself.
ex.
vendor/magento/module-catalog/Model/Indexer/Category/Flat/Action/Rows.php:147
,
vendor/magento/module-catalog/Model/Indexer/Product/Price/Action/Rows.php:100
But there are some indexes where batching is not implemented:
vendor/magento/module-catalog-inventory/Model/Indexer/Stock/Action/Rows.php:24
Large stores with extremely active product databases can suffer many related problems with indexers not being kept up to date. Products can disappear from the website, product data can be out of date on the website, excessive cache flushing can cause huge server load spikes.
This is due to the manner in which triggers are created by the core magento code in https://github.com/magento/magento2/blob/9544fb243d5848a497d4ea7b88e08609376ac39e/lib/internal/Magento/Framework/Mview/View/Subscription.php#L201
INSERT INTO will insert a new record regardless of whether one exists or not. REPLACE INTO will as the name says, replace a record - and in this case it will generate a new version_id since that is an auto_increment field.
Preconditions (*)
Steps to reproduce (*)
Expected result (*)
Actual result (*)
It will actually try to index each product 20 times, once for each update
Additional information to reproduce the issue
24-MB01
20 times via APIrest/all/V1/products/24-MB01
catalog_product_price_cl
DEFAULT_BATCH_SIZE
in filelib/internal/Magento/Framework/Mview/View.php
with 10 from 1000.bin/magento dev:query-log:enable
command.catalog_product_price_cl
. Please find attached the screenshot of db.log:The query always returns
entity_id 1
each time.In this case, it seems that entity_id 1 will process many times, instead of only 1 time, and should be on the latest one.
Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.