Closed gamort closed 1 year ago
Some status update in case people are interested.
We've been recently trying out quality patch from Adobe which is also supposed to fix this issue. We've been running it on a Magento 2.4.3-p3 shop for a couple of days, at first look it seems to work fine, but it's now been 2 days in a row that we've noticed that the index cronjob is just sitting there for hours not doing anything while waiting on the following SQL query to finish:
DELETE cl_duplicate
FROM catalog_product_attribute_cl AS cl_duplicate,
catalog_product_attribute_cl AS cl_origin
WHERE cl_duplicate.version_id < cl_origin.version_id
AND cl_duplicate.version_id > 93264739
AND cl_duplicate.version_id <= 93496654
AND cl_origin.version_id > 93264739
AND cl_origin.version_id <= 93496654
AND cl_duplicate.entity_id <=> cl_origin.entity_id
When I attempt to run this query on my local machine with dbdump from the server, it's also just running for many many minutes without seemingly to progress in any way (I've stopped it after more than 10 minutes).
The catalog_product_attribute_cl
table in our case contains 226253 rows and an EXPLAIN query returns this:
mysql> EXPLAIN DELETE cl_duplicate FROM catalog_product_attribute_cl AS cl_duplicate, catalog_product_attribute_cl AS cl_origin WHERE cl_duplicate.version_id < cl_origin.version_id AND cl_duplicate.version_id > 93264739 AND cl_duplicate.version_id <= 93496654 AND cl_origin.version_id > 93264739 AND cl_origin.version_id <= 93496654 AND cl_duplicate.entity_id <=> cl_origin.entity_id;
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | DELETE | cl_duplicate | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 113126 | 100.00 | Using where |
| 1 | SIMPLE | cl_origin | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 113126 | 10.00 | Using where |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set (0.00 sec)
(I'm not a db expert, so I can't really interpret these results, so just providing for extra info)
So it sounds like that quality patch is not fully fixing this issue here, I think we'll switch back to the community PR, that one seems to work better (which we have been running for some weeks on another project, using Magento 2.4.5-p2)
Just for the record : in parallel to this, a few month ago our developpers (@mabaud) created a much simpler patch. It's already been deployed successfully on a project. See the patch attached there : https://github.com/magento/magento2/issues/37367
After digging into walker logic the merged PR seems to be an overkill. The solution might have looked like this
SELECT `catalog_product_price_cl`.`entity_id`, max(version_id) as max_version_id
FROM `catalog_product_price_cl`
GROUP BY `entity_id`
HAVING (max_version_id > 12166384) AND (max_version_id <= 12600229)
ORDER BY max_version_id ASC
LIMIT 1000;
or this just get MAX version for given entity sort ASC by version, paginate result by BATCH_SIZE.
select * from (SELECT `catalog_product_price_cl`.`entity_id`, max(version_id) as max_version_id
FROM `catalog_product_price_cl`
GROUP BY `entity_id`
HAVING (max_version_id > 12166384) AND (max_version_id <= 12600229)
ORDER BY max_version_id ASC) as sb
where max_version_id >= 12166384
LIMIT 1000 OFFSET 0;
$vsFrom = intval(max(array_column($changelogs, 'max_version_id')));
Another addition would be to update view version after every batch iteration here
\Magento\Framework\Mview\View::executeAction
add
$vsFrom += $batchSize;
$action->execute($ids);
$this->getState()->setVersionId($vsFrom)->save(); //this line
this would keep last processed version in case of process crash. Otherwise the cron would start over the changelog processing and loop the indexer.
Whoever interested to try adaptation of @dooblem approach use these patches:
Index: vendor/magento/framework/Mview/View.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/vendor/magento/framework/Mview/View.php b/vendor/magento/framework/Mview/View.php
--- a/vendor/magento/framework/Mview/View.php
+++ b/vendor/magento/framework/Mview/View.php (date 1701886312378)
@@ -314,7 +314,21 @@
break;
}
$vsFrom += $batchSize;
- $action->execute($ids);
+
+ //TASKID: port of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+ $nextVersionId = intval(max(array_column($ids, 'max_version_id') ?: [0]));
+ if ($nextVersionId > 0) {
+ $vsFrom = $nextVersionId;
+ }
+
+ $isMultidimensional = isset($ids[0]) && is_array($ids[0]);
+
+ if ($isMultidimensional) {
+ $action->execute(array_map('intval', array_column($ids, 'entity_id')));
+ } else {
+ $action->execute($ids);
+ }
+ $this->getState()->setVersionId($vsFrom)->save();
}
}
and walker using max version id and having
Index: vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php b/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
--- a/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
+++ b/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php (date 1701886718709)
@@ -40,8 +40,8 @@
if (!$connection->isTableExists($changelogTableName)) {
throw new ChangelogTableNotExistsException(new Phrase("Table %1 does not exist", [$changelogTableName]));
}
-
- $select = $connection->select()->distinct(true)
+ //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+ $select = $connection->select()
->where(
'version_id > ?',
$fromVersionId
@@ -50,10 +50,16 @@
'version_id <= ?',
$toVersion
)
+ //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+ ->having('max_version_id > ?', $fromVersionId)
+ ->having('max_version_id <= ?', $toVersion)
->group([$changelog->getColumnName()])
+ //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+ ->order('max_version_id ASC')
->limit($batchSize);
+ //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+ $select->from($changelogTableName, [$changelog->getColumnName(), 'max_version_id' => new \Zend_Db_Expr('MAX(version_id)')]);
- $select->from($changelogTableName, [$changelog->getColumnName()]);
- return $connection->fetchCol($select);
+ return $connection->fetchAll($select);
}
}
To me works like a rocket. We have about 5K products but update them frequently so changelogs are huge more that 200K, so now instead of walking CL by 1000 any partial reindex can process the whole catalog just in 5 iteration. Additionally after every iteration the Mview version is saved to DB.
Large stores with extremely active product databases can suffer many related problems with indexers not being kept up to date. Products can disappear from the website, product data can be out of date on the website, excessive cache flushing can cause huge server load spikes.
This is due to the manner in which triggers are created by the core magento code in https://github.com/magento/magento2/blob/9544fb243d5848a497d4ea7b88e08609376ac39e/lib/internal/Magento/Framework/Mview/View/Subscription.php#L201
INSERT INTO will insert a new record regardless of whether one exists or not. REPLACE INTO will as the name says, replace a record - and in this case it will generate a new version_id since that is an auto_increment field.
Preconditions (*)
Steps to reproduce (*)
Expected result (*)
Actual result (*)
It will actually try to index each product 20 times, once for each update
Additional information to reproduce the issue
24-MB01
20 times via APIrest/all/V1/products/24-MB01
catalog_product_price_cl
DEFAULT_BATCH_SIZE
in filelib/internal/Magento/Framework/Mview/View.php
with 10 from 1000.bin/magento dev:query-log:enable
command.catalog_product_price_cl
. Please find attached the screenshot of db.log:The query always returns
entity_id 1
each time.In this case, it seems that entity_id 1 will process many times, instead of only 1 time, and should be on the latest one.
Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.