pimcore / ecommerce-framework-bundle

Ecommerce Framework community bundle provides e-commerce functionality such as product listing and filtering, pricing, carts and checkouts for Pimcore.
https://pimcore.com/docs/platform/Ecommerce_Framework/
Other
8 stars 30 forks source link

Ecommerce | Index Boostrap Command #83

Open andreas-gruenwald opened 4 years ago

andreas-gruenwald commented 4 years ago

Feature Request

I just have another idea in mind. Actually we could add a mode to the bootstrap command, so that it will only create "empty" ID-rows for non-existing ones, instead of processing the whole object(s):

https://github.com/pimcore/pimcore/blob/d4bf70250202c56fb81ece2d80a85c282daf67ac/bundles/EcommerceFrameworkBundle/Command/IndexService/BootstrapCommand.php#L131

The store table would then be used as a queue, and the ProcessPreparationQueue command would then take care about the rest:

This would depend on pimcore/pimcore#6487 though.

fashxp commented 4 years ago

might be challenging due to

so loading the dataobject will be necessary in any case, and most probably this is the most expensive operation (not extracting data from it for index)

andreas-gruenwald commented 4 years ago

might be challenging ...

So the effort of loading data objects is the same, but it is shifted from the bootstrap command to the ProcessPreparationQueue, where it is easier to setup.

Maybe there is something I do not see yet.

fashxp commented 4 years ago

but for knowing what IDs should be in index, you need to load the data object ... and the whole point of bootstrapping is knowing what IDs should be in index. processing preparation queue command already needs to know what IDs are in index.

or am I missing something?

andreas-gruenwald commented 4 years ago

Here is the difference of the two approaches. Let's assume that we have a system with 600.000 products. 200.000 are relevant for the product index.

Current mode:

BootstrapCommand:

  1. Load the product ID list (will result in 600.000 IDs) (cost: low).
  2. Iterate the IDs and load the 600.000 data objects (cost: high).
  3. If a product is in index, then add a store-table entry, otherwise remove (existing) rows (cost: low).

For 600.000 products, without parallelization, let's say that the command will run for 48 hours in a project where the product data model is complex.


Alternative/additional mode with BootstrapCommand and ProcessPreparationQueue:

BootstrapCommand:

  1. Load the product ID list (will result in 600.000 IDs) (cost: low).
  2. Iterate the IDs and add an empty ID row if no entry exists yet (cost: low).

Because the data objects are not loaded in this step, the process will probably terminate after a couple of minutes instead of hours/days.

ProcessPreparationQueue:

  1. All the empty rows that have been added by the BootstrapCommand in step 1 will be scanned, as in_preparation_queue=1 (cost: low).
  2. Those entries that haven't been added to the index before, will be processed (let's assume those are 400.000). If a product is in index, then the row will be updated, otherwise it will be deleted (cost: high).

There are two main differences:

  1. In mode number 2 those rows that are already in index and are already "prepared" will remain untouched. So only the delta will be processed, resulting in less data object reads.
  2. The BootstrapCommand does not have the capability to restart, if the command is stopped unintentionally. Let's assume that the whole processing takes 48 hours, but after 30 hours on the DEV server the Symfony container is built. The command will stop, and restarting the BootstrapCommand will result in another 48 hours run. With approach number 2 this won't happen, as the ProcessPreparationQueue will only process the "open" records, not all data objects based on the product list condition. Mode number one is probably still needed, so those modes could coexist.
markus-moser commented 3 years ago

We should definitly create something similar to what @andreas-gruenwald suggested. Currently it's really a very big challenge to do the bootstrapping in projects with many products and tenants.

fashxp commented 3 years ago

are there any BC breaks needed for that?

andreas-gruenwald commented 2 years ago

are there any BC breaks needed for that?

Don't think that BCs are needed. Also, the current behavior could remain as the default one. The best way would probably be to implement a project related pull request.

fashxp commented 2 years ago

I would love to see a PR for it :-)