magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.55k stars 9.32k forks source link

Price index performance on very large stores #37700

Open nicka101 opened 1 year ago

nicka101 commented 1 year ago

Preconditions and environment

Steps to reproduce

  1. Reindex catalog_product_index_price
  2. Observe that since 263c17f the performance of deleteOutdatedData is much worse (around 800-1200x slower than it should be in our very large instances with ~30 million prices)

Expected result

Price reindex is workable, and uses indexes for its join in deleteOutdatedData, because they exist

Actual result

It's 800-1200x slower than it should be, making a reindex take multiple days (based on the runtime of the individual queries, but can't tell for sure, as i'm not waiting that long), when on previous versions it took 4 hours

Additional information

When catalog_product_index_price_temp is created, it's based on catalog_product_index_price_tmp , which since 263c17f has had different primary key and indexes to the main table, meaning this JOIN has no indexes to use for the select, and is forced to do a full tablescan

We reproduced it as follows:

  1. All indexes have been switched to realtime mode - bin/magento indexer:set-mode realtime

  2. Then we started indexing only for MSI - bin/magento indexer:reindex inventory

As @vladimirspucko described above, the inventory indexing itself was performed quickly, about a minute for 850k products in our case. But then a plugin came into operation, which starts price indexing in realtime mode: \Magento\InventoryCatalog\Plugin\InventoryIndexer\Indexer\Stock\Strategy\Sync\PriceIndexUpdatePlugin::afterExecuteList()

which calls \Magento\Catalog\Model\Indexer\Product\Price\Processor::reindexList()

And if the mode is not schedule this indexing has never been performed on our installation with 850k products.

Note: if we use schedule mode, then full inventory indexing takes about 50 seconds for all the same 850k products

Release note

No response

Triage and priority

m2-assistant[bot] commented 1 year ago

Hi @nicka101. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:

m2-assistant[bot] commented 1 year ago

Hi @engcom-Dash. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:


m2-assistant[bot] commented 1 year ago

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-Hotel commented 1 year ago

Hello @nicka101,

Thanks for the report and contribution!

We have tried to reproduce the issue in 2.4-develop as well as 2.4.4 just prior to 2.4.5 (Before that commit), but it seems the issue is not reproducible for us.

We have tried to reproduce the issue with almost 39360 products, by changing the price of the products but for us the reindexing took a similar time.

Have you checked the same with some profiling tools?

Thanks

DmitryFurs commented 1 year ago

Faced with the same issue on a catalog of 850k products, the proposed solution really speeds up the indexing process.

engcom-Hotel commented 1 year ago

Hello @DmitryFurs,

Thanks for the response!

Can you please let us know the steps to reproduce the issue? Actually, in the first attempt, we were unable to reproduce the issue. You can got through our comment here https://github.com/magento/magento2/issues/37700#issuecomment-1680321607.

Thanks

vladimirspucko commented 1 year ago

We faced the same issue using Magento 2.4.6 + MSI. We have ~2600 products (simple, configurable, bundle) and 9 sources/stocks. In our particular case, the price index itself takes less than 1 minute but the inventory index takes more than 1 hour. With the proposed solution, the inventory index takes less than 2 minutes now.

DmitryFurs commented 1 year ago

Hi @engcom-Hotel,

We reproduced it as follows:

  1. All indexes have been switched to realtime mode - bin/magento indexer:set-mode realtime

  2. Then we started indexing only for MSI - bin/magento indexer:reindex inventory

As @vladimirspucko described above, the inventory indexing itself was performed quickly, about a minute for 850k products in our case. But then a plugin came into operation, which starts price indexing in realtime mode: \Magento\InventoryCatalog\Plugin\InventoryIndexer\Indexer\Stock\Strategy\Sync\PriceIndexUpdatePlugin::afterExecuteList()

which calls \Magento\Catalog\Model\Indexer\Product\Price\Processor::reindexList()

And if the mode is not schedule this indexing has never been performed on our installation with 850k products.

Note: if we use schedule mode, then full inventory indexing takes about 50 seconds for all the same 850k products

engcom-Hotel commented 1 year ago

Hello @DmitryFurs,

Thanks for the quick response!

The issue is reproducible for us in the 2.4-develop branch. We have tried to reproduce the issue with almost 40K products.

Hence confirming the issue.

Thanks

github-jira-sync-bot commented 1 year ago

:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-9470 is successfully created for this GitHub issue.

m2-assistant[bot] commented 1 year ago

:white_check_mark: Confirmed by @engcom-Hotel. Thank you for verifying the issue.
Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

mvenghaus commented 1 year ago

This is really a big issue. My partial price index takes several hours. I solved it by just adding an index to the "catalog_product_index_price_tmp" for "entity_id", "customer_group_id", "website_id"

speller commented 8 months ago

Any updates on this issue? In our case, reindexing of 100k products takes hours... On custom product bulk import, we try to reindex only changed products using the reindexList method. It takes hours. If we do the full reindex with the reindexAll method, it takes a couple of minutes.

speller commented 6 months ago

In Magento 2.4.7, the issue is still there.

Interestingly, there's a quality patch ACSD-56415 Fixes the issue where the performance of the partial price indexing is slowed down due to a DELETE query when the database has a lot of partial price data to index for 2.4.6-p3 which fixes the issue but this patch is not available for 2.4.7. The ACSD-56415 patch does exactly the same as what @mvenghaus mentioned.

I'm surprised that maintainers didn't mention this patch here as a possible solution.

I've adopted the ACSD-56415 patch to 2.4.7 by converting it to a composer patch and can confirm that it fixes the issue on my side. For example, without the patch, on my dev machine, 250 products were reindexed in 10 mins so I didn't even wait for the full reindex. It's clear that it'll take hours for 50K products. The full reindex of the 2M products database took 80 mins. With the patch applied, 50K products were reindexed in 1 minute.

engcom-Hotel commented 6 months ago

Hello @speller,

The patch will be incorporated into the forthcoming release, namely 2.4.8-beta1. But we will update you if this can be incorporated sooner.

Thanks

speller commented 6 months ago

@engcom-Hotel could you also make the patch available for 2.4.7?

engcom-Hotel commented 6 months ago

@speller I will update you on this soon.

andy-igoo commented 4 months ago

This was actually introduced in 2.4.6: https://github.com/magento/magento2/blob/2.4.6/app/code/Magento/Catalog/Model/Indexer/Product/Price/AbstractAction.php

We've experienced the indexing to not complete even though it was left running overnight from the end of the working day to the start of the new one due to a very large amount of prices; there's multiple stores, multiple customer groups and a lot of products.

I can see that the solution proposed at https://github.com/magento/magento2/pull/37701 has been rejected because indexes slow down table writes. However, the table writes are for a temporary table that is only involved in the indexing process and for the most part the tradeoff is better overall.

I also tried changing the sql query itself to improve performance. This worked well, but I've found the best results, for our case anyway, was to combine both the query change and add indexes to the temporary table catalog_product_index_price_tmp as in the pull request https://github.com/magento/magento2/pull/37701 (also linked above).

magento/vendor/magento/module-catalog/etc/db_schema.xml

        <constraint xsi:type="unique" referenceId="CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID">
            <column name="entity_id"/>
            <column name="customer_group_id"/>
            <column name="website_id"/>
        </constraint>

Note I've not fully tested data integrity yet with this. Probably be on Monday 17th I get back to look at that. I'll post a patch if all looks good then.

EDIT: See below

andy-igoo commented 4 months ago

This was actually introduced in 2.4.6: https://github.com/magento/magento2/blob/2.4.6/app/code/Magento/Catalog/Model/Indexer/Product/Price/AbstractAction.php

We've experienced the indexing to not complete even though it was left running overnight from the end of the working day to the start of the new one due to a very large amount of prices; there's multiple stores, multiple customer ....

On further testing we've found that the change on the sql query only makes a slight difference so in terms in risk and reward, it's probably better just leaving the query alone.

If anyone needs the patch (based on https://github.com/magento/magento2/pull/37701) to change the table indexes then here's a patch for 2.4.7:

37700_inventoryIndexer.patch

diff --git a/vendor/magento/module-catalog/etc/db_schema_whitelist.json b/vendor/magento/module-catalog/etc/db_schema_whitelist.json
--- a/vendor/magento/module-catalog/etc/db_schema_whitelist.json    2024-06-17 11:12:06.595496943 +0100
+++ b/vendor/magento/module-catalog/etc/db_schema_whitelist.json    2024-06-14 08:27:31.648158048 +0100
@@ -987,7 +987,8 @@
             "final_price": true,
             "min_price": true,
             "max_price": true,
-            "tier_price": true
+            "tier_price": true,
+            "id": true
         },
         "index": {
             "CATALOG_PRODUCT_INDEX_PRICE_TMP_CUSTOMER_GROUP_ID": true,
@@ -995,7 +996,8 @@
             "CATALOG_PRODUCT_INDEX_PRICE_TMP_MIN_PRICE": true
         },
         "constraint": {
-            "PRIMARY": true
+            "PRIMARY": true,
+            "CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID": true
         }
     },
     "catalog_category_product_index_tmp": {

diff --git a/vendor/magento/module-catalog/etc/db_schema.xml b/vendor/magento/module-catalog/etc/db_schema.xml
--- a/vendor/magento/module-catalog/etc/db_schema.xml   2024-06-17 11:11:56.495474101 +0100
+++ b/vendor/magento/module-catalog/etc/db_schema.xml   2024-06-14 08:28:02.284226062 +0100
@@ -1645,6 +1645,11 @@
         <constraint xsi:type="primary" referenceId="PRIMARY">
             <column name="id"/>
         </constraint>
+        <constraint xsi:type="unique" referenceId="CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID">
+            <column name="entity_id"/>
+            <column name="customer_group_id"/>
+            <column name="website_id"/>
+        </constraint>
     </table>
     <table name="catalog_category_product_index_tmp" resource="default" engine="innodb"
            comment="Catalog Category Product Indexer temporary table">

Apply patch with patch -p1 --forward < 37700_inventoryIndexer.patch

vandark25 commented 2 weeks ago

would it be possible to apply the patch to magento 2.3.7 ? https://github.com/nicka101/magento2/commit/90e4ecc06fa1be08f736c8a5fbe699430e9bc3c7