nystudio107 / craft-imageoptimize

Automatically create & optimize responsive image transforms, using either native Craft transforms or a service like Imgix, with zero template changes.
https://nystudio107.com/plugins/imageoptimize
Other
235 stars 36 forks source link

OOM when resaving optimised images #243

Closed mattgrayisok closed 3 years ago

mattgrayisok commented 3 years ago

Describe the bug

Speculative diagnosis after brief investigation

If a user uses a single assets volume for all images, and has lots of content, the resave optimized images background task can OOM PHP by trying to pull the entire contents of Craft's content table into memory.

Experienced this with a couple of clients, one was fixed by bumping the memory limit up to 512 MB, the other has a very large content table so raising the memory limit isn't feasible.

I believe it's this query:

https://github.com/nystudio107/craft-imageoptimize/blob/bd6b351851dda437af6553a4427aaceb12a2a521/src/jobs/ResaveOptimizedImages.php#L66

With these criteria:

https://github.com/nystudio107/craft-imageoptimize/blob/bd6b351851dda437af6553a4427aaceb12a2a521/src/services/OptimizedImages.php#L326

Causing an SQL statement which looks like this:

SELECT `elements`.`id`, `elements`.`fieldLayoutId`, `elements`.`uid`, `elements`.`enabled`, `elements`.`archived`, `elements`.`dateCreated`, `elements`.`dateUpdated`, `elements_sites`.`id` AS `siteSettingsId`, `elements_sites`.`slug`, `elements_sites`.`siteId`, `elements_sites`.`uri`, `elements_sites`.`enabled` AS `enabledForSite`, `assets`.`volumeId`, `assets`.`folderId`, `assets`.`filename`, `assets`.`kind`, `assets`.`width`, `assets`.`height`, `assets`.`size`, `assets`.`focalPoint`, `assets`.`keptFile`, `assets`.`dateModified`, `volumeFolders`.`path` AS `folderPath`, `assets`.`uploaderId`, `content`.`id` AS `contentId`, `content`.`title`, `content`.`field_addNewsFeed`, `content`.`field_body`, `content`.`field_chooseTemplate`, 
... all content fields ...
`content`.`field_youtube`
FROM (SELECT `elements`.`id` AS `elementsId`, `elements_sites`.`id` AS `elementsSitesId`, `content`.`id` AS `contentId`
FROM `elements` `elements`
INNER JOIN `assets` `assets` ON `assets`.`id` = `elements`.`id`
INNER JOIN `volumefolders` `volumeFolders` ON `volumeFolders`.`id` = `assets`.`folderId`
INNER JOIN `elements_sites` `elements_sites` ON `elements_sites`.`elementId` = `elements`.`id`
INNER JOIN `content` `content` ON `content`.`elementId` = `elements`.`id`
WHERE (`assets`.`volumeId`='10') AND (`elements`.`archived`=FALSE) AND (`elements`.`dateDeleted` IS NULL) AND (`elements`.`draftId` IS NULL) AND (`elements`.`revisionId` IS NULL)) `subquery`
INNER JOIN `assets` `assets` ON `assets`.`id` = `subquery`.`elementsId`
INNER JOIN `volumefolders` `volumeFolders` ON `volumeFolders`.`id` = `assets`.`folderId`
INNER JOIN `elements` `elements` ON `elements`.`id` = `subquery`.`elementsId`
INNER JOIN `elements_sites` `elements_sites` ON `elements_sites`.`id` = `subquery`.`elementsSitesId`
INNER JOIN `content` `content` ON `content`.`id` = `subquery`.`contentId`

Which you can probably tell is a bit dangerous! If all elements have an associated image, and they're all stored in a single volume, this will try to load the entire content table into memory.

Resulting in:

Allowed memory size of 536870912 bytes exhausted (tried to allocate 8192 bytes) in /var/www/html/vendor/yiisoft/yii2/db/Command.php:1293

Perhaps rather than setting the limit and offset values to null when executing this query the results could be chunked into groups of 50 or something similar which should resolve it.

To reproduce

Steps to reproduce the behaviour:

  1. Have a single assets volume
  2. Have an image associated with all content elements
  3. Have lots of content
  4. Resave optimized images

Expected behaviour

No explosions

Versions

khalwat commented 3 years ago

So I'm explicitly using ->each() here to avoid the situation you're discussing:

https://github.com/nystudio107/craft-imageoptimize/blob/v1/src/jobs/ResaveOptimizedImages.php#L83

It was added in to fix the situation you're describing, and some time ago... so I'm intrigued that this is coming up

From my understanding of how ->each() works, it should be iterating through it in batches, loading in at most the default of 100 rows at a time, but handling the "pagination" and such for me

https://www.yiiframework.com/doc/guide/2.0/en/db-query-builder#batch-query

dlindberg commented 3 years ago

Looks like a limitation of the MySQL PDO driver which will just buffer the whole thing anyway: https://www.yiiframework.com/doc/guide/2.0/en/db-query-builder#batch-query-mysql

khalwat commented 3 years ago

So this is pretty interesting:

https://github.com/yiisoft/yii2/issues/8420

It looks like the TL;DR on this is ->each() and ->batch() simply don't work as advertised using the MySQL PDO extension unless you jump through hoops doing this:

$unbufferedDb = new \yii\db\Connection([
    'dsn' => Yii::$app->db->dsn,
    'username' => Yii::$app->db->username,
    'password' => Yii::$app->db->password,
    'charset' => Yii::$app->db->charset,
]);
$unbufferedDb->open();
$unbufferedDb->pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

foreach ($query->batch(1000, $unbufferedDb) as $batch) {
    //...
}

The interesting thing is I see Craft using ->each() in a number of places, I'll loop them in in case they are unaware of the issue

Screen Shot 2020-12-28 at 10 03 46 PM

This post in that thread summarizes the potential solutions:

https://github.com/yiisoft/yii2/issues/8420#issuecomment-612742289

khalwat commented 3 years ago

Super-relavent, and also embarassingly, a discussion involving me:

https://twitter.com/markhuot/status/1240379989603897345

https://forum.yiiframework.com/t/large-amount-data-with-less-memory/81519/3

It looks like I could be using ActiveDataProvider here:

https://www.yiiframework.com/doc/api/2.0/yii-data-activedataprovider

$provider = new ActiveDataProvider([
    'query' => Post::find(),
    'pagination' => [
        'pageSize' => 20,
    ],
]);

// get the posts in the current page
$posts = $provider->getModels();
khalwat commented 3 years ago

Addressed in: https://github.com/nystudio107/craft-imageoptimize/commit/296e1e3512f2a1ddb08e86e26e131876fd91bf51

You can try it now by setting your semver in your composer.json to look like this:

    "nystudio107/craft-imageoptimize": "dev-develop as 1.6.21",

Then do a composer update

khalwat commented 3 years ago

So I refactored it (again) to use craft\db\Paginator in https://github.com/nystudio107/craft-imageoptimize/commit/c3e2a5b1111c6f94065b75c9e3f3b17b6142e109

You can try it now by setting your semver in your composer.json to look like this:

    "nystudio107/craft-imageoptimize": "dev-develop as 1.6.21",

Then do a composer update

Brandon has also addressed this in Craft 3.6: https://github.com/craftcms/cms/issues/7338

But I felt like I should fix it here as well. lmk!

khalwat commented 3 years ago

Released in https://github.com/nystudio107/craft-imageoptimize/releases/tag/1.6.21