magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.56k stars 9.32k forks source link

Catalog Category Indexing takes very long on MariaDB 10.3 with many products #25199

Closed amenk closed 4 years ago

amenk commented 5 years ago

Description (*)

I know MariaDB 10.3 is not yet supported, as per dev docs, so I add this as feature requests, while it will manifest as a bug as soon as we support MariaDB 10.3

Magento should remove the quotes in the SELECT entity_id IN statement when creating indexes. The entity_ids are not strings, and this seems to confuse the optimizer.

Steps to reproduce:

  1. Change value for simple products to 50000 for example (setup/performance-toolkit/profiles/ce/small.xml);
  2. Run in console: bin/magento setup:perf:generate-fixtures /var/www/html/magento24/setup/performance-toolkit/profiles/ce/small.xml for example

Expected behavior (*)

Catalog Category Indexing should run quickly, on MariaDB 10.1 and on 10.3.18

Benefits

Shop ownes can use MariaDB 10.3.18

Additional information

See this answer on StackExchange: https://magento.stackexchange.com/a/293345/81

I took the full query which is run several times (for each store view) on the catalog category products indexer and extracted the select part

SELECT 2 AS `category_id`, `cp`.`entity_id` AS `product_id`, IF(ccp.product_id IS NOT NULL, ccp.position, 0) AS `position`, IF(ccp.product_id IS NOT NULL, 1, 0) AS `is_parent`, 2 AS `store_id`, IFNULL(cpvs.value, cpvd.value) AS `visibility` FROM `catalog_product_entity` AS `cp`
 INNER JOIN `catalog_product_website` AS `cpw` ON cpw.product_id = cp.entity_id
 INNER JOIN `catalog_product_entity_int` AS `cpsd` ON cpsd.entity_id = cp.entity_id AND cpsd.store_id = 0 AND cpsd.attribute_id = 97
 LEFT JOIN `catalog_product_entity_int` AS `cpss` ON cpss.entity_id = cp.entity_id AND cpss.attribute_id = cpsd.attribute_id AND cpss.store_id = 2
 INNER JOIN `catalog_product_entity_int` AS `cpvd` ON cpvd.entity_id = cp.entity_id AND cpvd.store_id = 0 AND cpvd.attribute_id = 99
 LEFT JOIN `catalog_product_entity_int` AS `cpvs` ON cpvs.entity_id = cp.entity_id AND cpvs.attribute_id = cpvd.attribute_id  AND cpvs.store_id = 2
 LEFT JOIN `catalog_category_product` AS `ccp` ON ccp.product_id = cp.entity_id WHERE (cpw.website_id = '1') AND (IFNULL(cpss.value, cpsd.value) = 1) AND           (IFNULL(cpvs.value, cpvd.value) IN (2, 3, 4)) AND (cp.entity_id IN ('107280', '107281', '107282', '107283', '107284'
    [... around 70 K entity IDs ...]
    187965', '187966')) GROUP BY `cp`.`entity_id`;

This query alone runs 8 minutes on Maria DB 10.3

When I simply remove all the quotes via s/'//g in vi in the line with the "IN" statement I get

SELECT 2 AS `category_id`, `cp`.`entity_id` AS `product_id`, IF(ccp.product_id IS NOT NULL, ccp.position, 0) AS `position`, IF(ccp.product_id IS NOT NULL, 1, 0) AS `is_parent`, 2 AS `store_id`, IFNULL(cpvs.value, cpvd.value) AS `visibility` FROM `catalog_product_entity` AS `cp`
 INNER JOIN `catalog_product_website` AS `cpw` ON cpw.product_id = cp.entity_id
 INNER JOIN `catalog_product_entity_int` AS `cpsd` ON cpsd.entity_id = cp.entity_id AND cpsd.store_id = 0 AND cpsd.attribute_id = 97
 LEFT JOIN `catalog_product_entity_int` AS `cpss` ON cpss.entity_id = cp.entity_id AND cpss.attribute_id = cpsd.attribute_id AND cpss.store_id = 2
 INNER JOIN `catalog_product_entity_int` AS `cpvd` ON cpvd.entity_id = cp.entity_id AND cpvd.store_id = 0 AND cpvd.attribute_id = 99
 LEFT JOIN `catalog_product_entity_int` AS `cpvs` ON cpvs.entity_id = cp.entity_id AND cpvs.attribute_id = cpvd.attribute_id  AND cpvs.store_id = 2
 LEFT JOIN `catalog_category_product` AS `ccp` ON ccp.product_id = cp.entity_id WHERE (cpw.website_id = '1') AND (IFNULL(cpss.value, cpsd.value) = 1) AND           (IFNULL(cpvs.value, cpvd.value) IN (2, 3, 4)) AND (cp.entity_id IN (107280, 107281, 107282, 107283, 107284
    [... around 70 K entity IDs ...]
    187965, 187966)) GROUP BY `cp`.`entity_id`;

Next steps

I will try to provide a patch.

m2-assistant[bot] commented 5 years ago

Hi @amenk. Thank you for your report. To help us process this issue please make sure that you provided the following information:

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@amenk do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?


amenk commented 5 years ago

Not able to reproduce, because it's a feature requests for a recent MariaDB 10.3 which is not yet supported by Magento 2.3

amenk commented 5 years ago

I also created an issue in the MariaDB tracker: https://jira.mariadb.org/browse/MDEV-20871

amenk commented 5 years ago

vendor/magento/module-catalog/Model/Indexer/Category/Product/Action/Full.php:287

    foreach ($batchQueries as $query) {
        $this->connection->delete($this->tableMaintainer->getMainTmpTable((int)$store->getId()));
        $entityIds = $this->connection->fetchCol($query);
        foreach($entityIds as &$val) { // patch
            $val = (integer) $val;     // patch
        }                              // patch
        $resultSelect = clone $basicSelect;
        $resultSelect->where($whereCondition, $entityIds);

Or should we put it into \Magento\Framework\DB\Select::where and convert is_numeric values to numbers ?

Digging more down into the function leads to Zend Framework 1 code :(

amenk commented 5 years ago

I tried to patch \Magento\Framework\DB\Select::where

public function where($cond, $value = null, $type = null)
{
    // ensure that integers are passed as integer,
    // to avoid negative effects on performance
    if (is_array($value)) {
        foreach($value as &$val) {
            $val = $this->castNumeric($val);
        }
    } else {
        $value = $this->castNumeric($value);
    }

    if ($value === null && $type === null) {
        $value = '';
    } elseif ($type == self::TYPE_CONDITION) {
        $type = null;
    }
    if (is_array($value)) {
        $cond = $this->getConnection()->quoteInto($cond, $value);
        $value = null;
    }
    return parent::where($cond, $value, $type);
}

private function castNumeric($val) {
    if (is_numeric($val)) {
        return $val + 0;
    }
    return $val;
}

But this does not catch the situation when you edit a categories anchor property -> the update of the url_rewrite index still uses quotes in strings

amenk commented 5 years ago

I will inject the patch into \Magento\Framework\DB\Adapter\Pdo\Mysql::quoteInto

fritzmg commented 5 years ago

We are noticing this problem too with MariaDB 10.3.18 in Magento 2.3.2. We notice that most INSERT queries of the indexer take waaaay longer than expected and especially the following query takes up to 8 hours!

INSERT INTO `catalog_category_product_index_store1_tmp` (`category_id`, `product_id`, `position`, ...

We cannot even do a indexer:reindex because once it arrives at the categories index, the following error will happen after 8 hours:

Product Categories indexer process unknown error:
SQLSTATE[42S02]: Base table or view not found: 1146 Table 'catalog_category_product_index_store1_tmp' doesn't exist, query was: DELETE FROM `catalog_category_product_index_store1_tmp`

So apparently it takes so long, that the temporary table "vanishes" in between (?).

amenk commented 5 years ago

@fritzmg Feel free to try my patch https://github.com/magento/magento2/pull/25212

amenk commented 5 years ago

@fritzmg Which MariaDB version were you on before it became so slow?

fritzmg commented 5 years ago

Before it was actually MySQL 5.7.27. Since this is happening in a live system (hoster had to switch DB server due to other troubles) we can't unfortunately test your fix (yet) and thus our hoster is in the process of downgrading the MariaDB version again.

amenk commented 5 years ago

What we did was, to locally setup 10.3.18 in docker to reproduce the bug and test the fix.

fritzmg commented 5 years ago

Yes, I've seen that on Stackoverflow :). When we get around to it, we will try the same.

mpingu commented 5 years ago

@amenk can you try it with MariaDB 10.3.17 also, i got a case where it was in 10.3.18 slow and in 10.3.17 fast. My currently suspect is a regession because of https://jira.mariadb.org/browse/MDEV-20109, see the other Reports linked there.

fritzmg commented 5 years ago

We are noticing similar problems with MariaDB 10.2.27 - just not with the queries of the indexer. Certain selects take up to 60 seconds while they should only last milliseconds.

ghost commented 5 years ago

I'm also running mariadb 10.3.18 and having these issues. Indexers just never stop being triggered now even after tripling our hardware speed.

Attempting to clone a server then upgrade to 10.4 on it and test indexing on it.

ghost commented 5 years ago

10.4 latest made zero difference. @amenk patch reduced it from 45 mins to 40 seconds for the query he posted. Removing IN for entity_id altogether reduces query time to 0.0013 seconds

M2 mview and indexer system are once again trash and once again costing my client money and me time.

amenk commented 5 years ago

How did you remove "IN" for entity_id altogether? Did you still select the same rows? Via a JOIN?

Please maintain a constructive tone ;-)

ghost commented 5 years ago

I just manually changed the query for testing, I'm researching now if the entity_id's are do to a batch or handed off from a prior query. If it's just the mview batching, then it's 30,000 faster to not batch than to do it.

ghost commented 5 years ago

It does select all products and then batch them for a root category full reindex. Here's my result after removing batching entirely for that: Category Products index has been rebuilt successfully in 00:00:14

Before it was 5 hours.

Will test some more then release a module to override default m2 behavior. Honestly, should probably just write a module to overwrite all indexers at some point they are terrible I have worked on this project nearly 3 years and this is a constant source of suffering. Many other ecommerce systems don't have something like this for good reason, just more examples of overengineering to make stuff worse and more costly to develop, run, and maintain. I can't keep a positive tone because I have nothing positive to say about it, and I doubt you would either if your quality of life and that of your clients was worsened by poorly designed magento2. If we could switch off it we would but sunk cost is a lot at this point and it's as I said a few years work.

ghost commented 5 years ago

https://github.com/SoloDevAtrix/snippets/blob/master/indexer_catalog_category_products.php runs in :02 seconds on the same database, my frontend functions perfect with it.

My original glance over that it selected all products for root was wrong they actually do quite a bit of specific join conditions on all types. I have my database matching a full run pretty closely. Only difference is a few positions are adjusted in my view, for the better, as well as don't have weird one off where it'll put in entries into the index for things like A product with cat path 1/2/4/142/160 normally it'd put in the 160 the 2 and 4 but 142 would be skipped by m2. but in some cases it'd put it in for no reason. i have the option in my code to put it in for all or none. was the only diff I could find in the results of the code output into the database outside slight position stuff as I said.

Enabling putting in all the non parent non leaf options doubles the size as well as seems to achieve nothing. So left it off.

If you use it power to you, I offer zero warranty, even implied, and you're on your own to get it to work. It's also alpha code at best so if you run into a bug then you ran into a bug cuz it's just finished with it's first test run.

I just think it's really sour how badly m2 code runs this. You're filling ONE TABLE.

Here's the reasons why m2's queries are SLOW SLOW SLOW:

  1. Use of IN to do batching. This changes mysql joins from better join types to a range type which is slower.
  2. Redoing tasks because of batching. Stop batching your shit unless it's necessary and if it is then don't use IN statements.
  3. Using temporary tables, almost always bad for performance. Avoid using them ever.
  4. Tons of self joins, you had way way way too many and some of them served almost no purpose.
  5. Forcing all the work on mysql. Sometimes this is efficient. Sometimes it's better to write a loop or do work in PHP. Some things are so complex the tradeoff for EVERYONE that has to work on it is just better to chunk it up into code sections in php. This seems to use mysql or be damned and it be damned.
  6. way too much data from all those joins. simple is better. Adding one bad join can spike usage from .0001 seconds to some query that can take 3+mins to run. Use EXPLAIN and understand what it says.

Last note, my working example code, doesn't follow m2 ideology. At all. If you guys cared as much about pragmatic efficient solutions as you did your cursed ideology then maybe m2 wouldn't be a garbage heap.

This is a serious problem for my client, it's actually maybe costing him his entire business.

The indexers lock up the database, adding hardware barely helps. Locked up database means slow loads on any page not already cached in varnish. Cache warming helps but can't do every page on the site. Leads to lower conversions. He can't fix or edit products because again, it's so slow. And when he does, he gets locked out because hours of indexers are running. So if my language is harsh, that's why.

I've worked for this client oh, 9 years now. The switch to magento2 is by far the worst decision that's ever been made for the company.

amenk commented 4 years ago

set in_predicate_conversion_threshold = 4294967295; set global in_predicate_conversion_threshold = 4294967295;

seems also to help, as a workaround.

See: https://jira.mariadb.org/browse/MDEV-20871?focusedCommentId=138523&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-138523

There might be a fix in MariaDB 10.3.21

bve-9384 commented 4 years ago

Migrated to MariaDB 10.4.12 on my M2.1.7 install with 6000 products (centos 8, php7.0) Indexing seems to be OK, plan to migrate to M2.3 within the next few weeks

magento indexer:reindex
Design Config Grid index has been rebuilt successfully in 00:00:00
Customer Grid index has been rebuilt successfully in 00:00:00
Product Flat Data index has been rebuilt successfully in 00:00:01
Category Flat Data index has been rebuilt successfully in 00:00:00
Category Products index has been rebuilt successfully in 00:00:01
Product Categories index has been rebuilt successfully in 00:00:00
Product Price index has been rebuilt successfully in 00:00:02
Product EAV index has been rebuilt successfully in 00:00:03
Stock index has been rebuilt successfully in 00:00:00
Catalog Rule Product index has been rebuilt successfully in 00:00:00
Catalog Product Rule index has been rebuilt successfully in 00:00:00
Catalog Search index has been rebuilt successfully in 00:01:01
nobodyfamouz commented 4 years ago

set in_predicate_conversion_threshold = 4294967295; set global in_predicate_conversion_threshold = 4294967295;

seems also to help, as a workaround.

See: https://jira.mariadb.org/browse/MDEV-20871?focusedCommentId=138523&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-138523

There might be a fix in MariaDB 10.3.21

Thanks to this indexing of category products went from impossible (7+ hours) to 10 seconds! Thanks a lot!

magento-engcom-team commented 4 years ago

:white_check_mark: Confirmed by @engcom-Alfa Thank you for verifying the issue. Based on the provided information internal tickets MC-37309 were created

Issue Available: @engcom-Alfa, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

magento-engcom-team commented 4 years ago

Hi @amenk. Thank you for your report. The issue has been fixed in magento/magento2#27129 by @kandy in 2.4-develop branch Related commit(s):

The fix will be available with the upcoming 2.4.2 release.

harrigo commented 3 years ago

I seem to be getting a similar problem, show processlist shows lots of things randomly sending data for what seems like forever. Originaly seemed to start happening with related products but my query for that was slightly modified and thought it was my fault however changed it and it went away.

Now however am noticing queries hanging indefinitely when running schema recurring for Magento_Indexer section of setup upgrade...

I'm on 2.3.7 and commits here are for 2.4.2 will there be a similar fix available for 2.3.x as hoping it is a similar issue?

xantek-imc commented 2 years ago

I had this issue with 2.3.7-p3 and MariaDB 10.4.24 Using MariaDB 10.3.34 solved the issue.

realadityayadav commented 1 year ago

For anybody on Maria db 10.4 and experiencing long reindexing times please do https://experienceleague.adobe.com/docs/commerce-operations/performance-best-practices/configuration.html

optimizer_switch='rowid_filter=off' optimizer_use_condition_selectivity = 1

vasilii-b commented 1 year ago

What @realadityayadav told helped a bit in the situation I've been.

Besides that, a huge impact had the changes for in_predicate_conversion_threshold as suggested by @nobodyfamouz .

The combo I had to deal with: MariaDb 10.4.27 Magento 2.3.7-p3

Reno-Axel commented 1 year ago

Hi, I was struggling with the same problem, catalog_category_product index take too long, I resolve it by reducing batchRowsCount:

`

500

`

MariaDB: 10.4.29 Magento: 2.4.2-p2

RakeshJesadiya commented 1 year ago

I have improved the Magento category indexer performance by adding a given parameter to the Warden configuration yaml file by given link. https://www.rakeshjesadiya.com/improve-category-indexer-performance-local-setup-by-warden-magento/

erikhansen commented 3 months ago

@RakeshJesadiya Thanks for the tip. My DB instance gave me this error:

mysqld: Error while setting value '"rowid_filter=off"' to 'optimizer_switch'

So instead in the.warden/warden-env.yml file I added this:

services:
  db:
    volumes:
      - ./.warden/mysql/mysql_configuration.cnf:/etc/mysql/mariadb.conf.d/99-mysql_configuration.cnf

And then in .warden/mysql/mysql_configuration.cnf:

[mysqld]
max_allowed_packet=1024M
explicit_defaults_for_timestamp=on
optimizer_search_depth=4
optimizer_use_condition_selectivity=1
optimizer_switch='rowid_filter=off'