magento / inventory

Magento Inventory Project (a.k.a MSI)
Open Software License 3.0
336 stars 248 forks source link

Unnecessary conversion of product to SKU and back to ID using GetProductIdsBySkus #1941

Closed Swahjak closed 5 years ago

Swahjak commented 5 years ago

Magento 2.3

Summary

I'm not quite sure as to why, but to me there are some very strange cases where products and converted to SKU's and then back to ID's. Why SKU is used as identifier is a mystery to me (since we have ID's), but this seems to be a Magento core choice.

Correct me if I'm wrong, but is seems as if Magento\InventorySales\Model\IsProductSalableCondition\IsProductSalableConditionChain is called using SKU's (that where extracted form a Product which has an ID) and then in almost every condition the SKU's are converted to ID's.

Condition > $this->getStockItemConfiguration->execute($sku, $stockId); > $this->getLegacyStockItem->execute($sku) > $this->getProductIdsBySkus->execute([$sku])[$sku];

In our current setup this leads to 7 queries

SELECT `catalog_product_entity`.`sku`, `catalog_product_entity`.`entity_id` FROM `catalog_product_entity` WHERE (sku IN ('RC30CBX'))

on a single product page.

This is initially triggered by Magento\CatalogInventory\Observer\AddInventoryDataObserver calling $this->stockHelper->assignStatusToProduct($product);

Examples

Magento\InventoryCatalog\Plugin\CatalogInventory\Helper\Stock\AdaptAssignStatusToProductPlugin

    public function aroundAssignStatusToProduct(
        Stock $subject,
        callable $proceed,
        Product $product,
        $status = null
    ) {
        if (null === $product->getSku()) {
            return;
        }

        try {
            $this->getProductIdsBySkus->execute([$product->getSku()]);

Magento\InventoryCatalog\Plugin\CatalogInventory\Helper\Stock\AdaptAssignStatusToProductPlugin

    public function aroundAssignStatusToProduct(
        Stock $subject,
        callable $proceed,
        Product $product,
        $status = null
    ) {
        if (null === $product->getSku()) {
            return;
        }

        try {
            $this->getProductIdsBySkus->execute([$product->getSku()]);

Proposed solution

I'm not sure if this is working as intended, but the least that could be done is a some kind of caching on Magento\InventoryCatalog\Model\GetProductIdsBySkus.

Swahjak commented 5 years ago

Somewhat related to #1921 since this issue is a result of using SKU instead of ID.

maghamed commented 5 years ago

hi @Swahjak , thanks for your question.

. Why SKU is used as identifier is a mystery to me (since we have ID's), but this seems to be a Magento core choice.

Magento is tending to isolated components design. You can ream more about this here - https://github.com/antonkril/architecture/blob/fdcfafe1d6187f9abaf18af9eec7c10e8067a637/design-documents/service-isolation.md

What it means in preactice that different components (bounded contexts in terms of Domain Driven Design) should communicate to each other via the Service Layer. Also from service layer perspective inernal entity identifiers, especially if we are talking about Database generated IDs are usually meaningless in the scope of another system or sub-system. That's why, for example, importing/exporting products Magento does not expose Product IDs, because for another system (ERP, PIM, etc) these IDs are meaningless.

the similar story is integration of Magento Catalog and Inventory (MSI). These two sub-systems are integrated and refer to products by Natural Identity (SKU)

Magento\InventorySales\Model\IsProductSalableCondition\IsProductSalableConditionChain is called using SKU's (that where extracted form a Product which has an ID) and then in almost every condition the SKU's are converted to ID's.

The service

/**
 * Service which detects whether Product is salable for a given Stock (stock data + reservations)
 *
 * @api
 */
interface IsProductSalableInterface
{
    /**
     * Get is product in salable for given SKU in a given Stock
     *
     * @param string $sku
     * @param int $stockId
     * @return bool
     */
    public function execute(string $sku, int $stockId): bool;
}

which calls under the hood IsProductSalableConditionChain knows nothing about how the code of business logic got this SKU, whether it retrived it from product entity interface which already holds the Product Id, or from another place which does not hold the product id data.

In our current setup this leads to 7 queries

SELECT `catalog_product_entity`.`sku`, `catalog_product_entity`.`entity_id` FROM `catalog_product_entity` WHERE (sku IN ('RC30CBX'))

Correct, but these are very fast queries. Exact match by indexed data applying covering index key. Moreover, indexation layer could be easily added to the service

$this->getProductIdsBySkus->execute

to avoid unneeded queries if transformation SKU -> ID already calculated fro requested SKU

Proposed solution I'm not sure if this is working as intended, but the least that could be done is a some kind of caching on Magento\InventoryCatalog\Model\GetProductIdsBySkus.

Agree with proposed solution, the caching layer should be added to GetProductIdsBySkus service

maghamed commented 5 years ago

the corresponding PR has been merged in 2.3-develop branch and would be bundled into the next patch release of MSI soon

Btw it gave a slight Performance boost.

Swahjak commented 5 years ago

@maghamed First thanks for being so elaborate. I was going to write a rant about how SKU is a useless piece of information for us (it's basically a random set of characters made up by a random employee) but neither you (and certainly not I) are in the position to do anything about that anyway.

the similar story is integration of Magento Catalog and Inventory (MSI). These two sub-systems are integrated and refer to products by Natural Identity (SKU)

I understand, but then again don't. MSI knows it's Magento. We're loading product with ID's, extracting SKU's to load ID's. I'm all for abstraction but should be weighed against the cost. Is this ever going to be used outside Magento? Probably not. Is the API SKU based, yes, but that doesn't mean that internals have to adhere to API logic. Repositories can contain both methods for retrieval by ID and SKU.

Correct, but these are very fast queries

To be blunt; I don't really care about how fast queries are, I care about minimizing overhead. It's 7 queries that, in my opinion, should not have to be executed.

To comment on pull #1960; why not cache individual sku's? Now if I fetch sku a, b and next fetch a it will still have to execute a query. There are also multiple cases where a similar logic would / could apply; GetProductTypesBySkus, GetLegacyStockItem, GetStockBySalesChannel.

maghamed commented 5 years ago

Hi, @Swahjak. It's great to see that people who care share their opinion and if they consider something as incorrect - make changes by yourself and deliver them in Pull Requests! I really appreciate that (in fact not only me but all people in Magento).

Is the API SKU based, yes, but that doesn't mean that internals have to adhere to API logic.

Agree

Repositories can contain both methods for retrieval by ID and SKU.

Agree

We just wanted to make contracts of the most used services of Inventory to work with SKUs:

\Magento\InventorySalesApi\Api\IsProductSalableInterface::execute(string $sku, int $stockId): bool;
\Magento\InventorySalesApi\Api\GetProductSalableQtyInterface::execute(string $sku, int $stockId): float;
\Magento\InventorySalesApi\Api\IsProductSalableForRequestedQtyInterface::execute(
    string $sku,
    int $stockId,
    float $requestedQty
): \Magento\InventorySalesApi\Api\Data\ProductSalableResultInterface;

and don't duplicate these services with Product Ids, because in this case 3rd party developers would be bothered pluginizing both entry points instead of the only one.

I care about minimizing overhead. It's 7 queries that, in my opinion, should not have to be executed.

Agree that we no need to calculate what has already been calculated, especially 6-7 times, no matter how fast it is. It's just following the DRY principle.

why not cache individual sku's? Now if I fetch sku a, b and next fetch a it will still have to execute a query.

This is the most interesting question. I was thinking a bit about it while reviewing PR #1960 My point here is that taking into account that our cache is valid just in a scope of single HTTP request, it does not make sense to introduce more sophisticated caching layer which is able to store data on the very low granularity level (per one product SKU -> ID). Because we don't have such business scenarious where in the scope of one HTTP request processing we need to resolve SKUs -> IDs on the bulk level, and then apply resolving item by item. If we had a caching layer that supposed to be re-used by different HTTP requests - I am totally agree that in this particular case we have to deal with the most granular level possible. But introducing this verbosity we can get some extra complexity now, especially working with Bulk resolving.

Let me make a code review of your PR

maghamed commented 5 years ago

Just finished Code Review of your PR - https://github.com/magento-engcom/msi/pull/1966#pullrequestreview-188329498

do you accept my arguments for cache granularity or you still see a necessity in the low granular caching layer?