magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.54k stars 9.31k forks source link

Why "search_tmp_" used? and whats use of it? #26184

Closed nagrgk closed 4 years ago

nagrgk commented 4 years ago

15545

https://magento.stackexchange.com/questions/143420/magento2-search-tmp-join-leads-to-empty-results-in-category-listing

Preconditions

There are lot of issues with Magento, biggest is performance issue at category/search. Main issue is creating dynamic tables on run time and causing lot of issues.

  1. create search_tmp temp table when click on category/search
  2. it creates more than 12 or 24 tables for every search you do for any keyword or sometime it goes more than 40.

Steps to reproduce (*)

  1. take any magento2 website and search for any product
  2. it creates dynamic tables and drop it after search is done

Expected result (*)

  1. it should not create temp tables on run time and causing website down

Actual result (*)

  1. it should not create temp tables on run time

questions:

  1. why search_tmp tables introduced in Magento2 , what is reason?
  2. lets says it has been introduced but why is it generated at website run time , in frontend or when you search for product?
  3. why same Magento1 approach not used? is it something being used with scheduling?
  4. How can we not use search_tmp tables?
  5. There are so many queries regarding this, why is it not being answered? is it resolved in latest versions?
  6. Any actions had been taken in Magento2.3.3?
  7. Why it is being used when customer searching products and creating more than 12 tables in single page?
  8. Lets says if search_tmp not completed, page goes to not responding, it ends up creating sleep queries. queries are not being killed which supposed to kill after page render.
  9. is this happening with Elastic search? I dont think so, because I could see issue even after enabling mysql search.

Please let all questions answered because all customers facing this issue. customers are running huge servers to keep website up and running.

Please help.

Thanks nagaraj

m2-assistant[bot] commented 4 years ago

Hi @nagrgk. Thank you for your report. To help us process this issue please make sure that you provided the following information:

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

@nagrgk do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?


nagrgk commented 4 years ago

yes, it is reproducible in vanilla Magento

m2-assistant[bot] commented 4 years ago

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-Hotel commented 4 years ago

Hello @nagrgk I am not able to reproduce this issue on a fresh Magento 2.4-develop because search_tmp_ table usage was deprecated. image Magento is using search_query table now: image

So i have to close this issue as not relevant.

Thanks for your report!

hostep commented 4 years ago

@engcom-Hotel, it doesn't mean that if the class is marked as deprecated that it is no longer used within Magento. I can still find at least 3 references in Magento's CatalogSearch module where it is being used.

Also be aware that not every single Magento user is able to run ElasticSearch on their hosting, a lot of Magento users will still use Mysql as the search engine.

I'm reopening, because it was closed for the wrong reason.

nagrgk commented 4 years ago

@engcom-Hotel 2.4 is long way to go. search_temp queries are killing stores.

Before you justify, please think about the performance and business use cases always. here people are very eager to close tickets and not take any actions and not thinking about the end users.

engcom-Hotel commented 4 years ago

@hostep @nagrgk i have to discuss with my team lead about this issue, because we cannot just confirm issues that are not reproducible on the Magento 2.4-develop.

nagrgk commented 4 years ago

@engcom-Hotel looks this is perfect reply. taking time and investigating is always best than just closing.

thanks

m2-assistant[bot] commented 4 years ago

Hi @engcom-Golf. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

magento-engcom-team commented 4 years ago

:white_check_mark: Confirmed by @engcom-Hotel Thank you for verifying the issue. Based on the provided information internal tickets MC-30417 were created

Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

engcom-Hotel commented 4 years ago

Hello @hostep @nagrgk Sorry for my mistake. I did not understand the description correctly. Now we found a problem.

Thanks for collaboration!

arunelangovan commented 4 years ago

Is there any fix or patch so far, Our productions is getting killed atleast once a day.

This is the query we are being slammed with increasing the DB CPU and connection count

SELECT e . * , cat_index . position AS cat_index_position , price_index . price , price_index . tax_class_id , price_index . final_price , IF ( price_index . tier_price IS NOT ? , LEAST ( price_index . min_price , price_index . tier_price ) , price_index . min_price ) AS minimal_price , price_index . min_price , price_index . max_price , price_index . tier_price , IFNULL ( review_summary . reviews_count , ? ) AS reviews_count , IFNULL ( review_summary . rating_summary , ? ) AS rating_summary , stock_status_index . stock_status AS is_salable FROM catalog_product_entity AS e INNER JOIN catalog_category_product_index_store2 AS cat_index ON cat_index . product_id = e . entity_id AND cat_index . store_id = ? AND cat_index . visibility IN ( ?+ ) AND cat_index . category_id = ? INNER JOIN catalog_product_index_price AS price_index ON price_index . entity_id = e . entity_id AND price_index . website_id = ? AND price_index . customer_group_id = ? LEFT JOIN review_entity_summary AS review_summary ON e . entity_id = review_summary . entity_pk_value AND review_summary . store_id = ? AND review_summary . entity_type = ( SELECT review_entity . entity_id FROM review_entity WHERE ( entity_code = ? ) ) LEFT JOIN cataloginventory_stock_status AS stock_status_index ON e . entity_id = stock_status_index . product_id AND stock_status_index . website_id = ? AND stock_status_index . stock_id = ? INNER JOIN search_tmp_5e60fd001dca54_28934909 AS search_result ON e . entity_id = search_result . entity_id ORDER BY cat_index . position ASC , e . entity_id DESC LIMIT ?

ilnytskyi commented 4 years ago

Using ES may help. Or even better to profile the application and eliminate redundant queries on DB dxebug dumps this profiler https://github.com/NoiseByNorthwest/php-spx or xdebug removing join from that table would result with empty filters in layered navigation

arunelangovan commented 4 years ago

@ilnytskyi We are using elastic search, but not sure what went wrong. We get same error. The above query is hitting with DB again whenever we lift up maintenance page

Can you lead me to a possible fix?

arunelangovan commented 4 years ago

@magento-engcom-team @engcom-Hotel Any insight on the above issue?

sshymko commented 4 years ago

@nagrgk @arunelangovan We've experienced the issue with the queries INSERT INTO search_tmp_* overloading the MySQL database hosted at AWS. Normally, such queries execute very quickly in under ~10ms. However, occasionally dramatic spikes are observed in the New Relic APM. https://twitter.com/SergiiShymko/status/1045332638989312001?s=20 DoHDkDbU8AEhqKR

The root cause turned out to be exceeding the Write IOPS limit of RDS database. By default, RDS databases have a General Purpose (SSD) storage. Such databases accrue the IOPS credits at a rate dependent on the storage size. For a 100G database, the limit is 300 Write IOPS. If the usage exceeds the credits accumulation rate, all credits will be used up and AWS will artificially limit the database throughput. Performance will be restored once enough credits have been accumulated. You can see your Write IOPS stats in the RDS monitoring charts. aws_rds_write_iops_limit

The solution is to upgrade the storage to the Provisioned IOPS. It costs $100/mo for 1000 IOPS. That's a reasonable expense to guarantee reliable handling of the elevated traffic levels.

aws_rds_provisioned_iops
sshymko commented 4 years ago

Related bug #9807

arunelangovan commented 4 years ago

@sshymko Thanks for the explanation

maghamed commented 4 years ago

hey @nagrgk, the main reason why we have dynamically generated search_tmp table is the way how Magento search works right now, and in particular that internally our Search API returns the only IDs of matched documents, but not documents by themselves.

The simplified algorithm is quite simple.

  1. User makes request specifying needed filters and providing search phrase
  2. Magento builds either MySQL (deprecated since 2.4.0) or Elasticsearch Full-Text Search query and processes this request. As a result a bunch (up to 10 000 results) of document IDs which correspond to the search query is returned
  3. Then Magento has to refine data for those entities (currently we support only product search). So, the system needs to retrieve data for the bunch of product IDs above. There are two options here: a. Make a query similar toSelect ... from catalog_product_entity where entity_ID in (lis_of_ids_here) b. Store the IDs into a temporary table and then make a query with join, similar to select .. from catalog_product_entity inner join search_tmp where catalog_product_entity.entity_id = search_tmp.entiry_id

The last option is more efficient if the number of returned documents is high because MySQL would use an index for joining tables, while in case of the query with IN (many_ids) - most probably query optimizer will decide to end up with disk scan.

The behavior is the same for both adapters MySQL and Elasticsearch because currently both of them follow the same API (read return IDs of documents found, but not data itself).

Regarding the spikes you mentioned @sshymko, we noticed similar spikes on some installations before. It was really hard to reproduce those spikes while testing, and as far as I remember those issues were specific to particular MySQL/Maria version and been solved with an upgrade.

sivaschenko commented 4 years ago

The issue may be not actual as MySQL was removed from Magento version 2.4.0 and Elasticsearch should be used instead

m2-assistant[bot] commented 4 years ago

Hi @gabrieldagama. Thank you for working on this issue. Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:


gabrieldagama commented 4 years ago

Closing this issue because it is not actual since MySQL was removed from Magento version 2.4.0 and Elasticsearch should be used instead.

hws47a commented 4 years ago

I believe all the comments above state that the issue is still present even when you have Elasticsearch enabled. It's also reproduced in 2.4

gabrieldagama commented 4 years ago

Hi @hws47a, I've done some investigation on this before closing, but maybe I'm missing something.

As far as I understood Magento\Framework\Search\Adapter\Mysql\TemporaryStorage is being used by Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier which is an implementation of Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplierInterface and it's preference is being overwriten by ElasticSearch's implementation Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier.

On my investigation, from what I could see, the ids that come from ElasticSearch are filtered with an e.entity_id IN (?) condition, as shown on Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier line 73, as a result we are getting the following query:

SELECT `e`.*,
       `price_index`.`price`,
       `price_index`.`tax_class_id`,
       `price_index`.`final_price`,
       IF(price_index.tier_price IS NOT NULL, Least(price_index.min_price,
                                              price_index.tier_price),
       price_index.min_price)                   AS `minimal_price`,
       `price_index`.`min_price`,
       `price_index`.`max_price`,
       `price_index`.`tier_price`,
       Ifnull(review_summary.reviews_count, 0)  AS `reviews_count`,
       Ifnull(review_summary.rating_summary, 0) AS `rating_summary`,
       `stock_status_index`.`stock_status`      AS `is_salable`
FROM   `catalog_product_entity` AS `e`
       INNER JOIN `catalog_product_index_price` AS `price_index`
               ON price_index.entity_id = e.entity_id
                  AND price_index.customer_group_id = 0
                  AND price_index.website_id = '1'
       INNER JOIN `catalog_product_website` AS `product_website`
               ON product_website.product_id = e.entity_id
                  AND product_website.website_id = 1
       LEFT JOIN `review_entity_summary` AS `review_summary`
              ON e.entity_id = review_summary.entity_pk_value
                 AND review_summary.store_id = 1
                 AND review_summary.entity_type = (SELECT
                     `review_entity`.`entity_id`
                                                   FROM   `review_entity`
                                                   WHERE  ( entity_code =
                                                            'product' ))
       INNER JOIN `cataloginventory_stock_status` AS `stock_status_index`
               ON e.entity_id = stock_status_index.product_id
                  AND stock_status_index.website_id = 0
                  AND stock_status_index.stock_id = 1
WHERE  ( stock_status_index.stock_status = 1 )
       AND ( e.entity_id IN ( 1 ) )
ORDER  BY Field(e.entity_id, 1) 

I've tried to find any usage of this temporary table but wasn't able to. Can you please provide more details on why you believe this is still actual?

Thanks

gabrieldagama commented 4 years ago

Closing this issue again, please, if you find evidence that the search_tmp is being used provide more details here and re-open the issue.

Thank you.

webrod commented 3 years ago

Hi, on magento 2.4.1, tmp table is still used, see logged query in catalog page:

SELECT COUNT(DISTINCT e.entity_id)
FROM `catalog_product_flat_1` AS `e`
INNER JOIN `catalog_product_index_price` AS `price_index` ON price_index.entity_id = e.entity_id AND price_index.customer_group_id = 0 AND price_index.website_id = '1'
INNER JOIN `catalog_category_product_index_store1` AS `cat_index` ON cat_index.product_id=e.entity_id AND cat_index.store_id=1 AND cat_index.visibility IN(3, 4) AND cat_index.category_id=1997
INNER JOIN `cataloginventory_stock_status` AS `stock_status_index` ON e.entity_id = stock_status_index.product_id
INNER JOIN `search_tmp_5fa7aa3ad505c7_73512525` AS `search_result` ON e.entity_id = search_result.entity_id
WHERE (stock_status_index.stock_status = 1)

EDIT: actually this seems to be linked to my new theme which includes some modules and somehow still use search_tmp table :-( With LUMA them, I don't see search_tmp in queries aymore

scott-xigen commented 3 years ago

Hi @hws47a, I've done some investigation on this before closing, but maybe I'm missing something.

As far as I understood Magento\Framework\Search\Adapter\Mysql\TemporaryStorage is being used by Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier which is an implementation of Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplierInterface and it's preference is being overwriten by ElasticSearch's implementation Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier.

On my investigation, from what I could see, the ids that come from ElasticSearch are filtered with an e.entity_id IN (?) condition, as shown on Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier line 73, as a result we are getting the following query:

SELECT `e`.*,
       `price_index`.`price`,
       `price_index`.`tax_class_id`,
       `price_index`.`final_price`,
       IF(price_index.tier_price IS NOT NULL, Least(price_index.min_price,
                                              price_index.tier_price),
       price_index.min_price)                   AS `minimal_price`,
       `price_index`.`min_price`,
       `price_index`.`max_price`,
       `price_index`.`tier_price`,
       Ifnull(review_summary.reviews_count, 0)  AS `reviews_count`,
       Ifnull(review_summary.rating_summary, 0) AS `rating_summary`,
       `stock_status_index`.`stock_status`      AS `is_salable`
FROM   `catalog_product_entity` AS `e`
       INNER JOIN `catalog_product_index_price` AS `price_index`
               ON price_index.entity_id = e.entity_id
                  AND price_index.customer_group_id = 0
                  AND price_index.website_id = '1'
       INNER JOIN `catalog_product_website` AS `product_website`
               ON product_website.product_id = e.entity_id
                  AND product_website.website_id = 1
       LEFT JOIN `review_entity_summary` AS `review_summary`
              ON e.entity_id = review_summary.entity_pk_value
                 AND review_summary.store_id = 1
                 AND review_summary.entity_type = (SELECT
                     `review_entity`.`entity_id`
                                                   FROM   `review_entity`
                                                   WHERE  ( entity_code =
                                                            'product' ))
       INNER JOIN `cataloginventory_stock_status` AS `stock_status_index`
               ON e.entity_id = stock_status_index.product_id
                  AND stock_status_index.website_id = 0
                  AND stock_status_index.stock_id = 1
WHERE  ( stock_status_index.stock_status = 1 )
       AND ( e.entity_id IN ( 1 ) )
ORDER  BY Field(e.entity_id, 1) 

I've tried to find any usage of this temporary table but wasn't able to. Can you please provide more details on why you believe this is still actual?

Thanks

What if we're using a third party search like Algolia or Klevu, rather the Elasticsearch, which means we have to have Elasticsearch turned off to work?

convenient commented 1 year ago

@gabrieldagama I might be getting lost in the virtual types in vendor/magento/module-elasticsearch/etc/di.xml but I don't see how that it is injected into vendor/magento/module-catalog-graph-ql/Model/Resolver/Products/DataProvider/ProductSearch.php

I am seeing graphql requests for a productDetail pulling the data from elasticsearch, but then still going through the old Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier causing search_tmp_abc123 queries to be generated.

As a workaround I have created a new module that depends on Magento_Elasticsearch and Magento_CatalogGraphQl. in etc/graphql/di.xml I have put the following to force it to use the correct SearchResultsApplier

<type name="Magento\CatalogSearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplierFactory">
    <arguments>
        <argument name="instanceName" xsi:type="string">Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier</argument>
    </arguments>
</type>

And either i'm missing something massive or I can't see how this graphql resolver would ever work with this implementation.

The ProductSearch resolver constructs the SearchResultInterface, but in a way that never defines size or currentPage https://github.com/magento/magento2/blob/0f9a056c8d83c4f319626b3e56ec52a533999f25/app/code/Magento/CatalogGraphQl/Model/Resolver/Products/DataProvider/ProductSearch.php#L141-L153

This is fine for the deprecated mysql implementation https://github.com/magento/magento2/blob/0f9a056c8d83c4f319626b3e56ec52a533999f25/app/code/Magento/CatalogSearch/Model/ResourceModel/Fulltext/Collection/SearchResultApplier.php#L48-L53

But for elasticsearch we get a Missing required argument $size of Magento\Elasticsearch\Model\ResourceModel\Fulltext\Collection\SearchResultApplier error because size or currentPage are not defined https://github.com/magento/magento2/blob/0f9a056c8d83c4f319626b3e56ec52a533999f25/app/code/Magento/Elasticsearch/Model/ResourceModel/Fulltext/Collection/SearchResultApplier.php#L44-L48

To work around this I've put a plugin in the module ./etc/graphql/di.xml to ensure they always exist

class FixRequiredParameters
{
    /**
     * @param SearchResultApplierFactory $subject
     * @param array $data
     * @return array
     */
    public function beforeCreate(SearchResultApplierFactory $subject, array $data = [])
    {
        if (!isset($data['size'])) {
            $data['size'] = 1;
        }
        if (!isset($data['currentPage'])) {
            $data['currentPage'] = 1;
        }
        return [$data];
    }
}

So far this looks all good and I can load product pages without creating search_tmp queries