andreas-gruenwald commented 4 years ago

Performance Improvement

Problem Summary

In one of our projects, the retrieval of a simple category filter takes more than 1.2 seconds. I found out that some loc and data structures for loading Elastic search aggregations can be improved. Especially the conversion process for aggregations and buckets in the ElasticSearch product list seems to consume a lot of time for large datasets (>20.000 categories in total; 20 on the first level).

Problem Details

This is the initial test code:

public function testControllerAction() {
        $filter = new \Pimcore\Model\DataObject\Fieldcollection\Data\FilterCategory();
        $includeParentCategories = true;
        $filter->setIncludeParentCategories(false);
        $filter->setRootCategory($rootCategory);
        $ecFactory = \Pimcore\Bundle\EcommerceFrameworkBundle\Factory::getInstance();
        $filter->setScriptPath($renderingScriptPath);
        $filter->setIncludeParentCategories($includeParentCategories);
        $filterService = $ecFactory->getFilterService($ecFactory->getEnvironment()->getCurrentAssortmentTenant());

        $productList = $ecFactory->getIndexService()->getProductListForCurrentTenant();
        $productList->prepareGroupBySystemValues('system.parentCategoryIds', true);

        $productList->setCategory(!$forceRootCategorySetup ? $currentCategory : $rootCategory);

        $renderedCategoryNav = $filterService->getFilterFrontend(
            $filter,
            $productList,
            [
                'categoryIds' => $rootCategory->getId(),
                'env' => $this->getEnv(),
                'actualCurrentCategoryId' => $currentCategory->getId()
            ]);
}

I digged into ProductList\ElasticSearch\AbstractElasticSearch and doLoadGroupByValues and found out that the extraction of the filter aggregations and buckets is very time intense.

The subsequential code sequence took (enabled debugger) took 568ms.

if ($result['aggregations']) {            
                foreach ($result['aggregations'] as $fieldname => $aggregation) {
                    $buckets = $this->searchForBuckets($aggregation);
                    $groupByValueResult = [];
                    if ($buckets) {
                        foreach ($buckets as $bucket) {
                            if ($this->getVariantMode() == self::VARIANT_MODE_INCLUDE_PARENT_OBJECT) {
                                $groupByValueResult[] = ['value' => $bucket['key'], 'count' => $bucket['objectCount']['value']];
                            } else {
                                $data = $this->convertBucketValues($bucket); // support subaggregations
                                $groupByValueResult[] = $data;
                            }
                        }
                    }
                    $this->preparedGroupByValuesResults[$fieldname] = $groupByValueResult;
                }
            }
}

Solution Concept

I used the following code to demonstrate that the retrieval could be done much faster: 32ms (vs. original 568ms).

if ($result['aggregations']) {
                $optimisedResult = [];
                foreach ($result['aggregations'] as $fieldname => $aggregation) {
                    $buckets = $aggregation['buckets'];
                    $json = json_encode($buckets);
                    $json = str_replace('key":', 'value":', $json);
                    $json = str_replace('doc_count":', 'count":', $json);
                    $buckets = json_decode($json, true);
                    $preparedGroupByValuesResultsOptimised[$fieldname] = $buckets;
                    $this->preparedGroupByValuesResults[$fieldname] = $groupByValueResult; 
                }
}

With disabled debugger it is still 27ms vs. 94 ms!

This example is just a demonstration that arrays as data structure can cause a lot of performance overhead. The code above should be refactored carefully. In general it might be very helpful to profile the ecommerce productlists of Pimcore with higher amounts of categories/aggregations, as they turn out to be very useful to identify potential performance issues.

fashxp commented 4 years ago

can you provide a PR? With comments why we are doing this that way.

andreas-gruenwald commented 4 years ago

I am not sure, if this solution is ready for a PR yet, as it might be fragile regarding nested aggregations, etc. We will investigate it within the project and I will create a PR as soon as there is a stable outcome.

pimcore / ecommerce-framework-bundle

Ecommerce Performance | Use a more efficient data structure for retrieving Elastic Search aggregations? #90

Performance Improvement

Problem Summary

Problem Details

Solution Concept