magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.59k stars 9.32k forks source link

Product Import - Category data, including min position, is loaded; even if COL_CATEGORY is not set (Performance / Extra Work) #39422

Open steven-hoffman-jomashop opened 1 day ago

steven-hoffman-jomashop commented 1 day ago

Preconditions and environment

Steps to reproduce

  1. Create a csv file with two (2) columns
    • sku and any other column should be enough
    • 1-2 rows should be enough
  2. Run the product import logic with append option selected
    • (Both the API and admin panel import should exhibit this behavior)

Expected result

Extra logic related to loading / saving category data will not be called

Actual result

  1. Logic related to finding the min position is called
  2. Logic related to insert, on duplicate key update, is called for the existing category product relations
  3. Logic related to loading the category collection for the entire site is called

Additional information

If the above three sets of calls; when the product to category table (catalog_category_product) is large loading the min position is the most expensive. This is because their is no index on position along with category_id and the query needs to scan all entries for that category. SELECT MIN(position) AS `position` FROM `catalog_category_product` WHERE (category_id = ?)

The insert of existing data is un-needed, but seems to cost less.

Logic related to loading the category collection for the entire site is called once in bulk; it costs a flat amount per import depending on the number of categories in the site.


Code related notes

https://github.com/magento/magento2/commit/5da1c357bfd9 appears to introduce most of the issue for 1 and 2 above. It changes processRowCategories to load the existing category product data from the product. The change likely was intended make the data available for AfterImportDataObserver as generating the new urls appears to require the existing categories. (It seems, that the data made available to AfterImportDataObserver will differ if a COL_CATEGORY is set, as if it is, the product's existing category data will not be loaded).

The issue is that Product::saveProductCategoriesPhase adds it directly into the categoriesCache. Without differentiating between rowData and existing/loaded product data. (And Product::_saveProductCategories is passed categoriesCache with both rowData and existing product category associations and calling getProductCategoriesDataSave) which calls the min position query and the insert, even if COL_CATEGORY is not present in the import's columns)

For 3 above, CategoryProcessor calls initCategories during __construct.


Possible fixes: 1) saveProductCategoriesPhase and processRowCategories can be adjusted, where processRowCategories returns some metadata to differentiate rowData from product loaded data. Then saveProductCategoriesPhase can add to categoriesCache with the value false. Then _saveProductCategories‎ can be called with only the 'true' values. 2) Alternatively, processRowCategories can be adjusted to not load the product's data, and getProductCategories can be adjusted to load the product data if rowData is missing. (processRowCategories could also load the product data into a sperate property and getProductCategories can use which ever property has data available).

Release note

No response

Triage and priority

m2-assistant[bot] commented 1 day ago

Hi @steven-hoffman-jomashop. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce.

m2-assistant[bot] commented 4 hours ago

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down: