Open steven-hoffman-jomashop opened 1 day ago
Hi @steven-hoffman-jomashop. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce.
@magento I am working on this
Join Magento Community Engineering Slack and ask your questions in #github channel. :warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting. :clock10: You can find the schedule on the Magento Community Calendar page. :telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.
Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.2.4-develop
branch2.4-develop
branch, please, add the label Reproduced on 2.4.x
.Issue: Confirmed
once verification is complete.
Preconditions and environment
dev:query-log:enable
will show the extra queriesSteps to reproduce
Expected result
Extra logic related to loading / saving category data will not be called
Actual result
Additional information
If the above three sets of calls; when the product to category table (catalog_category_product) is large loading the min position is the most expensive. This is because their is no index on position along with category_id and the query needs to scan all entries for that category.
SELECT MIN(position) AS `position` FROM `catalog_category_product` WHERE (category_id = ?)
The insert of existing data is un-needed, but seems to cost less.
Logic related to loading the category collection for the entire site is called once in bulk; it costs a flat amount per import depending on the number of categories in the site.
Code related notes
https://github.com/magento/magento2/commit/5da1c357bfd9 appears to introduce most of the issue for 1 and 2 above. It changes processRowCategories to load the existing category product data from the product. The change likely was intended make the data available for AfterImportDataObserver as generating the new urls appears to require the existing categories. (It seems, that the data made available to AfterImportDataObserver will differ if a COL_CATEGORY is set, as if it is, the product's existing category data will not be loaded).
The issue is that Product::saveProductCategoriesPhase adds it directly into the
categoriesCache
. Without differentiating between rowData and existing/loaded product data. (And Product::_saveProductCategories is passedcategoriesCache
with both rowData and existing product category associations and calling getProductCategoriesDataSave) which calls the min position query and the insert, even if COL_CATEGORY is not present in the import's columns)For 3 above, CategoryProcessor calls initCategories during
__construct
.Possible fixes: 1)
saveProductCategoriesPhase
andprocessRowCategories
can be adjusted, where processRowCategories returns some metadata to differentiate rowData from product loaded data. ThensaveProductCategoriesPhase
can add tocategoriesCache
with the value false. Then_saveProductCategories
can be called with only the 'true' values. 2) Alternatively,processRowCategories
can be adjusted to not load the product's data, andgetProductCategories
can be adjusted to load the product data if rowData is missing. (processRowCategories
could also load the product data into a sperate property andgetProductCategories
can use which ever property has data available).AfterImportDataObserver
has it's own checks on rowData and in many cases loading the product to get the existing category data will not be needed).CategoryProcessor
if https://github.com/magento/magento2/blob/2.4.8-beta1/app/code/Magento/CatalogImportExport/Model/Import/Product.php#L1641C47-L1641C68clearFailedCategories
is called insideprocessRowCategories
after the check on !empty($rowData[self::COL_CATEGORY]Release note
No response
Triage and priority