magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.45k stars 9.29k forks source link

Asynchronous indexing can break large servers #30012

Closed gamort closed 1 year ago

gamort commented 3 years ago

Large stores with extremely active product databases can suffer many related problems with indexers not being kept up to date. Products can disappear from the website, product data can be out of date on the website, excessive cache flushing can cause huge server load spikes.

This is due to the manner in which triggers are created by the core magento code in https://github.com/magento/magento2/blob/9544fb243d5848a497d4ea7b88e08609376ac39e/lib/internal/Magento/Framework/Mview/View/Subscription.php#L201

INSERT INTO will insert a new record regardless of whether one exists or not. REPLACE INTO will as the name says, replace a record - and in this case it will generate a new version_id since that is an auto_increment field.

Preconditions (*)

  1. An extremely large database. Example 500,000 products
  2. An active update schedule

Steps to reproduce (*)

  1. Create a sample database with 500,000 products
  2. Enable Asynchronous indexing
  3. Reduce crons to every 10 minutes
  4. Run a mass script to change the inventory for the products every 30 seconds. So within 10 minutes, you will have made 10 million updates

Expected result (*)

  1. When the indexer runs, it will index the latest product information once for each index

Actual result (*)

It will actually try to index each product 20 times, once for each update

Additional information to reproduce the issue

  1. Change the price of the product with SKU 24-MB01 20 times via API rest/all/V1/products/24-MB01
  2. Doing the price change data inserted in the table catalog_product_price_cl
  3. Adjust the DEFAULT_BATCH_SIZE in file lib/internal/Magento/Framework/Mview/View.php with 10 from 1000.
  4. Enable the DB logs via the bin/magento dev:query-log:enable command.
  5. grep the text to get the SQL query on table catalog_product_price_cl. Please find attached the screenshot of db.log:
image

The query always returns entity_id 1 each time.

In this case, it seems that entity_id 1 will process many times, instead of only 1 time, and should be on the latest one.

Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

hostep commented 1 year ago

Some status update in case people are interested.

We've been recently trying out quality patch from Adobe which is also supposed to fix this issue. We've been running it on a Magento 2.4.3-p3 shop for a couple of days, at first look it seems to work fine, but it's now been 2 days in a row that we've noticed that the index cronjob is just sitting there for hours not doing anything while waiting on the following SQL query to finish:

DELETE cl_duplicate
FROM catalog_product_attribute_cl AS cl_duplicate,
     catalog_product_attribute_cl AS cl_origin
WHERE cl_duplicate.version_id < cl_origin.version_id
  AND cl_duplicate.version_id > 93264739
  AND cl_duplicate.version_id <= 93496654
  AND cl_origin.version_id > 93264739
  AND cl_origin.version_id <= 93496654
  AND cl_duplicate.entity_id <=> cl_origin.entity_id

When I attempt to run this query on my local machine with dbdump from the server, it's also just running for many many minutes without seemingly to progress in any way (I've stopped it after more than 10 minutes). The catalog_product_attribute_cl table in our case contains 226253 rows and an EXPLAIN query returns this:

mysql> EXPLAIN DELETE cl_duplicate FROM catalog_product_attribute_cl AS cl_duplicate, catalog_product_attribute_cl AS cl_origin WHERE cl_duplicate.version_id < cl_origin.version_id AND cl_duplicate.version_id > 93264739 AND cl_duplicate.version_id <= 93496654 AND cl_origin.version_id > 93264739 AND cl_origin.version_id <= 93496654 AND cl_duplicate.entity_id <=> cl_origin.entity_id;
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table        | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | DELETE      | cl_duplicate | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 113126 |   100.00 | Using where |
|  1 | SIMPLE      | cl_origin    | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 113126 |    10.00 | Using where |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set (0.00 sec)

(I'm not a db expert, so I can't really interpret these results, so just providing for extra info)

So it sounds like that quality patch is not fully fixing this issue here, I think we'll switch back to the community PR, that one seems to work better (which we have been running for some weeks on another project, using Magento 2.4.5-p2)

dooblem commented 9 months ago

Just for the record : in parallel to this, a few month ago our developpers (@mabaud) created a much simpler patch. It's already been deployed successfully on a project. See the patch attached there : https://github.com/magento/magento2/issues/37367

ilnytskyi commented 9 months ago

After digging into walker logic the merged PR seems to be an overkill. The solution might have looked like this

  1. Get max version for every entity smth like this
    SELECT `catalog_product_price_cl`.`entity_id`, max(version_id) as max_version_id
    FROM `catalog_product_price_cl`
    GROUP BY `entity_id`
    HAVING  (max_version_id > 12166384) AND (max_version_id <= 12600229)
    ORDER BY max_version_id ASC
    LIMIT 1000;

    or this just get MAX version for given entity sort ASC by version, paginate result by BATCH_SIZE.

    select * from (SELECT `catalog_product_price_cl`.`entity_id`, max(version_id) as max_version_id
               FROM `catalog_product_price_cl`
               GROUP BY `entity_id`
               HAVING  (max_version_id > 12166384) AND (max_version_id <= 12600229)
               ORDER BY max_version_id ASC) as sb
    where max_version_id >= 12166384
    LIMIT 1000 OFFSET 0;
  2. As @dooblem suggested just start next batch from max version from current batch
    $vsFrom = intval(max(array_column($changelogs, 'max_version_id')));

Another addition would be to update view version after every batch iteration here

\Magento\Framework\Mview\View::executeAction

add

            $vsFrom += $batchSize;
            $action->execute($ids);
            $this->getState()->setVersionId($vsFrom)->save(); //this line

this would keep last processed version in case of process crash. Otherwise the cron would start over the changelog processing and loop the indexer.

ilnytskyi commented 9 months ago

Whoever interested to try adaptation of @dooblem approach use these patches:

Index: vendor/magento/framework/Mview/View.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/vendor/magento/framework/Mview/View.php b/vendor/magento/framework/Mview/View.php
--- a/vendor/magento/framework/Mview/View.php
+++ b/vendor/magento/framework/Mview/View.php   (date 1701886312378)
@@ -314,7 +314,21 @@
                 break;
             }
             $vsFrom += $batchSize;
-            $action->execute($ids);
+
+            //TASKID: port of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+            $nextVersionId = intval(max(array_column($ids, 'max_version_id') ?: [0]));
+            if ($nextVersionId > 0) {
+                $vsFrom = $nextVersionId;
+            }
+
+            $isMultidimensional = isset($ids[0]) && is_array($ids[0]);
+
+            if ($isMultidimensional) {
+                $action->execute(array_map('intval', array_column($ids, 'entity_id')));
+            } else {
+                $action->execute($ids);
+            }
+            $this->getState()->setVersionId($vsFrom)->save();
         }
     }

and walker using max version id and having

Index: vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php b/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
--- a/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php
+++ b/vendor/magento/framework/Mview/View/ChangeLogBatchWalker.php  (date 1701886718709)
@@ -40,8 +40,8 @@
         if (!$connection->isTableExists($changelogTableName)) {
             throw new ChangelogTableNotExistsException(new Phrase("Table %1 does not exist", [$changelogTableName]));
         }
-
-        $select = $connection->select()->distinct(true)
+        //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+        $select = $connection->select()
             ->where(
                 'version_id > ?',
                 $fromVersionId
@@ -50,10 +50,16 @@
                 'version_id <= ?',
                 $toVersion
             )
+            //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+            ->having('max_version_id > ?', $fromVersionId)
+            ->having('max_version_id <= ?', $toVersion)
             ->group([$changelog->getColumnName()])
+            //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+            ->order('max_version_id ASC')
             ->limit($batchSize);
+        //TASKID: adaptation of https://github.com/magento/magento2/issues/30012#issuecomment-1829364973 approach
+        $select->from($changelogTableName, [$changelog->getColumnName(), 'max_version_id' => new \Zend_Db_Expr('MAX(version_id)')]);

-        $select->from($changelogTableName, [$changelog->getColumnName()]);
-        return $connection->fetchCol($select);
+        return $connection->fetchAll($select);
     }
 }

To me works like a rocket. We have about 5K products but update them frequently so changelogs are huge more that 200K, so now instead of walking CL by 1000 any partial reindex can process the whole catalog just in 5 iteration. Additionally after every iteration the Mview version is saved to DB.