litespeedtech / lscache-opencart

GNU General Public License v3.0
26 stars 23 forks source link

Exceed php memory limit by crawler on huge number of products #18

Open AndreyPopovNew opened 3 years ago

AndreyPopovNew commented 3 years ago

in Opencart by default are THREE path to product page:

  1. only product_id path: /index.php?route=product/product&product_id=41
  2. by category_id (categoy path) /index.php?route=product/product&path=20_27&product_id=41
  3. by manufacturer_id : /index.php?route=product/product&manufacturer_id=8&product_id=41

crawler algorithm contain 1 and 2, path 3 (by manufacturer_id) forgotten!

on huge number of products, for example, more than 6000, array urls() exceed php memory limit and crawler stop!

that's why I replace in

catalog/controller/extension/module/lscache.php

echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if (isset($categoryPath[$category['category_id']])) {
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                }
            }
            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
        }

        $this->crawlUrls($urls, $cli);

by this:

 echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $UrlsCount = 0;
        $UrlsCountCount = 0;
        $this->load->model('catalog/manufacturer');
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if(isset( $categoryPath[$category['category_id']] )){
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                    $UrlsCount++;
                }
            }
            $urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
            $UrlsCount++;

            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
            $UrlsCount++;

            if ( $UrlsCount > 4096 ) {
                $UrlsCountCount++;
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
                $this->crawlUrls($urls, $cli);
                $urls = array();
                $UrlsCount = 0;
            }
        }
        echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $this->crawlUrls($urls, $cli);
AndreyPopovNew commented 3 years ago

after some tests in heavy load real conditions I investigate that 4096 urls in array urls() also can exceed php memory limit

problem in $categoryPath that also required more memory.

I decide reduce limit of $UrlsCount to 2048. testing .......

 echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $UrlsCount = 0;
        $UrlsCountCount = 0;
        $this->load->model('catalog/manufacturer');
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if(isset( $categoryPath[$category['category_id']] )){
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                    $UrlsCount++;
                }
            }
            $urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
            $UrlsCount++;

            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
            $UrlsCount++;

            if ( $UrlsCount > 2048 ) {
                $UrlsCountCount++;
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
                $this->crawlUrls($urls, $cli);
                $urls = array();
                $UrlsCount = 0;
            }
        }
           if ( $UrlsCountCount > 0 ) {
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
            }
        $this->crawlUrls($urls, $cli);