silverstripe / silverstripe-s3

Silverstripe module to store assets in S3 rather than on the local filesystem (SS4/SS5 only)
BSD 3-Clause "New" or "Revised" License
19 stars 25 forks source link

Caching and memory usage issues with images stored in S3 #48

Closed markstuart closed 2 weeks ago

markstuart commented 2 years ago

We are seeing significant memory usage on pages with a lot of images when using this module. After some investigation, it seems that the issue is that when the cache that keeps a memory of the image urls gets dropped, the next request to these pages recalculates the image url by resizing the original image first. However, these images have likely been processed previously and already live in S3 (with a __FitMax or whatever variant added to the image name).

My questions are:

obj63mc commented 2 years ago

Thinking through this issue, if I had to guess running manipulations to the images in the template is hitting the Intervention Manipulation cache first. https://github.com/silverstripe/silverstripe-assets/issues/197

If that is not present it is probably running the manipulation code and then hits the file system. Wouldn't necessarily matter if using S3 or not. Just the issue is worse using S3 as it has all the extra network requests and it's own caching. Not 100% sure on the full execution path of the template engine but it may be worth disabling the flushable cache mentioned in the bug linked to above to see if that helps.

Also are you able to profile with and without the S3 module?

There is one flysystem cache issue https://github.com/thephpleague/flysystem-cached-adapter/issues/23

If you are using a cache store like redis, it stores all the cache in one entry which then must be JSON decoded on each pull from the db. If your site has say thousands of images this could be an issue.

maxime-rainville commented 2 years ago

Normally the CMS will not attempt to recreate image variant if they've already been created. This bit is controlled in ImageManipulation.php:

        $store = Injector::inst()->get(AssetStore::class);
        $tuple = $manipulationResult = null;
        if (!$store->exists($filename, $hash, $variant)) {

The logic for deciding if a file exist or not is controlled in FlysystemAssetStore:

    public function exists($filename, $hash, $variant = null)
    {
        if (empty($filename) || empty($hash)) {
            return false;
        }

        // If `applyToFileOnFilesystem` calls our closure we'll know for sure that a file exists
        return $this->applyToFileOnFilesystem(
            function (ParsedFileID $parsedFileID, Filesystem $fs, FileResolutionStrategy $strategy) use ($hash) {
                $parsedFileID = $strategy->stripVariant($parsedFileID);

                if ($parsedFileID && $originalFileID = $parsedFileID->getFileID()) {
                    if ($fs->has($originalFileID)) {
                        /** @var FileHashingService $hasher */
                        $hasher = Injector::inst()->get(FileHashingService::class);
                        $actualHash = $hasher->computeFromFile($originalFileID, $fs);

                        // If the hash of the file doesn't match we return false, because we want to keep looking.
                        return $hasher->compare($actualHash, $hash);
                    }
                }

                return false;
            },
            new ParsedFileID($filename, $hash, $variant)
        ) ?: false;
    }

Off the top of my head, it's not quite clear how this will interact with an S3 Adapter. In your context, it might be downloading each file and trying to compute their hashes ... which could be very painful and slow.

A quick fix, could be to to subclass FlysystemAssetStore and provide your own implementation of exists. You would then need to switch to using you own custom class with:

#---
Name: my-project-silverstripes3-assetscore
Only:
  envvarset: AWS_BUCKET_NAME
After:
  - '#silverstripes3-assetscore'
---
SilverStripe\Core\Injector\Injector:
  # Define our SS asset backend
  SilverStripe\Assets\Storage\AssetStore:
    class: App\Project\MyCustomAssetStore

Alternatively you could create a job that pre-warms your URL cache. Are you seeing this problem, when you flush the Silverstripe cache or does the League\Flysystem\Cached\CachedAdapter have a TTL that self invalidates?

markstuart commented 2 years ago

Thanks for the well thought out input @obj63mc and @maxime-rainville, gives us a few things to investigate, will let you know how we get on :+1:

sfozz commented 2 years ago

@maxime-rainville I'm looking a bit further into this... From what I have read if we create a variant with Fit() then should we expect that variant to be recorded anywhere?

For example in the File table in the DB there is a Variant column but even on systems where we are just using the default store this is always NULL. This seems a bit odd as if the image has already been resized then you should just record this in the DB as a new variant of the existing image an save on having to reprocess the image every time the cache is flushed.

samandeggs commented 1 year ago

@markstuart @sfozz did either of you figure out a solution or workaround here? This is an issue that we're experiencing and am nearing the point where I'm going to have to dig into this as the performance hit is getting ridiculous - thinking perhaps one of you solved it internally and we can get a PR going.

It does seem to really be that Variant is not being utilised when it should or could be, or at least, something is highly inefficient when checking if the files exist already as others have mentioned already. This system should absolutely not be regenerating images on each load, and separately, at least the silverstripe-cache shouldn't be ignored as much as it has been happening.

Let me know, otherwise I'll start digging into this and if solved I'll begin a PR at some point in the next weeks - but I will need some assistance from the community here.

We're having to rely a lot on CloudFront caching for this to be mitigated, but those first hits after the CF cache expires are painful - when it's only 30 or 40 images, and all of them were previously processed.

sfozz commented 1 year ago

For the specific project that it was causing us problems on we moved to having the image assets on EFS as this was killing performance.

In the use case we had it was an image gallery with 400+ images, still slow but not failing.

samandeggs commented 1 year ago

For the specific project that it was causing us problems on we moved to having the image assets on EFS as this was killing performance.

In the use case we had it was an image gallery with 400+ images, still slow but not failing.

Okay thanks. We're currently pondering whether to utilise our own upload field and own image class to overcome some of these issues we're facing. Ideally it can be solved within this repo, though - so others can take advantage of any resolutions.

rskuipers commented 1 year ago

We're running into the same issues trying to integrate with Digital Ocean Spaces (S3 compatible).

I put a network proxy in between to see the requests and it's doing about 10 requests per image. For 19 images on our homepage that's 190 network requests.

When the cache is warmed up it's all HEAD requests to check for existence, and when running it from my localhost takes about 23ms per call. Add some GET and PUT requests of varying sizes to that on the initial page load...

images

Silverstripe's way of image handling feels odd, I don't understand why image manipulations are executed immediately instead of queued and executed on render at once.

If anyone is down to do some pair programming to try and figure this out let me know.

samandeggs commented 1 year ago

If anyone is down to do some pair programming to try and figure this out let me know.

Somewhat keen, but slammed at the moment with work to fully contribute, for us because we utilise a CDN and CloudFront - the issue only crops up for a visitor hitting a page that is currently uncached, all subsequent loads hit the CDN for the images and loads instantly. The CDN (also powered by CloudFront) is also using http2 so the amount of simultaneous requests are increased and the issue is lessened even if it was just down to many requests for images by the visitor.

The issue very much seems to be that SilverStripe is trying to check if the images are present (many users pointing this out including yourself), and/or even regenerating the images. I've run a quick check, and the images aren't being regenerated into S3 until a modification on SilverStripe has occurred, which is correct. However I'm unsure if the SilverStripe cache is being regenerated each time which is adding to the slow down, otherwise the slowdown is likely simply that SS is checking the existence of each variation of the file and that the file even exists at each load - which is likely very unnecessary as a URL is all that is required.

At least as an option, we should be able to assume that S3 is definitely there, and definitely has the image, and that the URL stored is correct; and only attempt these exists() checks if a modification has occurred on the image once it's been established. Ideally, there is some sort of understanding of URLs of each of the variants of an image, and that this is set in stone until a modification has occurred - or at least perform this same task but apply a cache timer on these, so it's not checking for the existence of these images every single time the load occurs. There has to be some layer of assumption here that the product is working as intended, as the exists() checks likely incur the same result for the visitor if the image was present or not (failure to load) and are therefore running exists() is somewhat unnecessary baggage if the URL/Variant is stored in the database and is assumed correct.

Even if the manipulations are run immediately after modification on next load - any subsequent loads should assume that everything's correct, as any other modifications to the images should have been performed via SilverStripe, and therefore has full awareness of the situation. Within the context of the CMS, the check could occur, but is unlikely unnecessary as it seems to handle 400 errors fine when images are no longer available.

I don't know quite enough here, but my expectation is that when trying to load a variant of an image, say FillMax() - SilverStripe want's to first check if the image exists before operating a function on it - however if the variant exists and was created after the image was modified last, then it should attempt to simply load the variant URL that's stored, and points directly to the object in S3 - either via the CDN or not if implemented.

I'm unsure if these variant functions are the slowdown within itself, and everything is actually operating as quickly as possible, versus some other extra check that is superfluous - but I suspect if it's a simple DB check to see if that variant url is present and up-to-date versus a full check of the image's existence itself would increase the speed dramatically.

samandeggs commented 1 year ago

After a little bit more digging thanks to direction from @maxime-rainville - some thoughts:

stnvh commented 1 year ago

The default flysystem cache adapter used in this module is a memory adapter which is not persistent in any way. In my experience this causes many API calls to S3 on each page load to check for file existence, metadata, mime type.

Within this there are other reasons the API calls can be slow overall. Storing lots of objects in a bucket is always a growing cause due to ListObjectsV2 only returning pages of 1000 assets at once. The underlying filesystem adapter also makes no use of prefixing to query or store items.

A solution I found was to use a semi-persistent cache like memcached. There are a few others built in to choose from including PSR-6 support which could enable filesystem caching. See league/flysystem-cached-adapter.

---
Only:
  envvarset:
    - MEMCACHED_HOST
    - AWS_BUCKET_NAME
After:
  - '#assetsflysystem'
  - '#silverstripes3-flysystem'
---
SilverStripe\Core\Injector\Injector:
  MemcachedClient:
    class: 'Memcached'
    calls:
      - [ addServer, [ '`MEMCACHED_HOST`', 11211 ] ]

  MemcachedFlysystemCache:
    class: League\Flysystem\Cached\Storage\Memcached
    constructor:
      client: '%$MemcachedClient'
      key: 'flysystem'
      expire: 2592000

  League\Flysystem\Cached\Storage\Memory.public: '%$MemcachedFlysystemCache'
  League\Flysystem\Cached\Storage\Memory.protected: '%$MemcachedFlysystemCache'

EDIT: An example using the filesystem instead of memcached

---
Only:
  envvarset:
    - AWS_BUCKET_NAME
After:
  - '#assetsflysystem'
  - '#silverstripes3-flysystem'
---
SilverStripe\Core\Injector\Injector:
  Psr6FlysystemCache:
    class: League\Flysystem\Cached\Storage\Psr6Cache
    constructor:
      pool: '%$Symfony\Component\Cache\Adapter\FilesystemAdapter'
      key: 'flysystem'
      expire: 2592000

  League\Flysystem\Cached\Storage\Memory.public: '%$Psr6FlysystemCache'
  League\Flysystem\Cached\Storage\Memory.protected: '%$Psr6FlysystemCache'
samandeggs commented 1 year ago

@stnvh thanks a lot - your second example lowered loading speeds from 25 seconds to 6 seconds. It will obviously kill these cached images between container deploys, but we can solve that separately as this is an infrastructure thing rather than anything else. I'm surprised this behaviour isn't default handling on this repo - as it combines the best of SS and of S3.

I'm satisfied with this solution for the time being as it resolves the bulk of my issues as far as I can tell.

samandeggs commented 1 year ago

@stnvh do you happen to perform any functions with these to target specific types of files or images - currently the desire to cache items like this would be limited to images only - the desire to cache other files types is definitely not something desirable if possible to have such granular control.

stnvh commented 1 year ago

@samandeggs The flysystem cache adapter is very low level and only holds a list of file paths and metadata rather than caching the files themselves. I would certanly recommend the config above for all files stored in S3 as to prevent API requests on each load, however if you wanted to granularly control which items are stored for whatever reason the best start would be to subclass SilverStripe\S3\Adapter\PublicCachedAdapter.

In response to your first reply, you can preserve the cache across containers by using something like memcached or redis, this list then wouldn't need to be generated per container. You'll likely notice other performance benefits too if you set SS to use it for other stuff.

Right now the default cache adapter for this module under the hood is set to use what is essentially a simple, non-persistent array which only lasts the lifetime of the application. The next load will check the (empty) cache, make all the S3 requests again to check a file exists, query mime type and any other metadata, store it back in the array and the cycle restarts.

Utilising a persistent cache store instead makes this list available across loads (until the next flush). Flysystem handles failover if a new image is added which doesn't exist in the cache, it'll do a lookup to S3 and add the metadata so the next lookup is faster. A good example of why the flysystem cached adapter exists, however by using a non-persistent adapter you loose all these benefits.

The default cache adapter is likely fine in a context where files are stored on the filesystem due to the low latency (disk r/w speed being the only limit). A persistent cache is not really needed here (unless you want to minimise disk i/o), the filesystem is fast enough to lookup directly on load and very good at telling applications where and what stuff is. When these calls are swapped out for something like S3 where each read/write call instead becomes several network requests for a single operation there needs to be a persistent cache configured to reduce this work each time a file is requested.

There are concerns if you have lots of files on your site and use caching, see https://github.com/thephpleague/flysystem-cached-adapter/issues/23. Most of these issues can be avoided by subclassing the cache adapter and adding an md5 of the request URI or similar so each page essentially has it's own image cache - ugly but scalable.

obj63mc commented 9 months ago

Looking into this for issue #63 - @maxime-rainville - when looking at the FlysytemAssetStore.php class -

The exists function calls - applyToFileOnFilesystem - which in itself - calculates the hash of the file and ensures it is valid. Then on the exists function - even in the comment -

    // If `applyToFileOnFilesystem` calls our closure we'll know for sure that a file exists

The file is then rehashed again to see if it matches. this seems like it is not necessary and just provides extra overhead. I can say for sure though when checking for the hash when uncached - SS is streaming the file and calculating the hash but at least it is cached but is extremely slow. Is there a real reason for calculating these hashes as using the filesystem's > has function basically checks whether the file actually exists or not. Just trying to know why we are having to run a sha1 hash for every file. If I comment out the hashing checks on applyToFileOnFilesystem and exists - it runs super fast and I see no issues when using the filesystem. If I try to upload a duplicate - it creates a new version (v2) file as expected.

If I replace the file using the asset admin with this commented out also no issues as well.

If these hashes are not really needed, I may subclass the FlysystemAssetStore to remove that functionality as it really does slow down the CMS admin.

samandeggs commented 9 months ago

@obj63mc will this PR will only be supported through SS 5.1+? We're not planning on shifting to 5 for some time (as it's still very new) as there is still "Full Support" through till April 2024 for 4.13 (and EOL April 2025). This has been an ongoing issue for a very long time and this is the first promising alternative rather than trying to manipulate existing configs for a iffy workaround.

Please advise if it's just parts that will require 5.1 or is the intention the major version which includes these caching updates will require 5.1+.

Edit: I presume this is entirely dependant on if flysytem to v3 is required for this caching change, and if so, I'm guessing that this is not able to be achieved for SS4.13. However, if there is the capacity to split this into a change that can support the caching fix within 4.13, this would be extremely welcomed.

obj63mc commented 9 months ago

So the main PR for #63 is for SS5.1 as I had to upgrade all the flysystem stuff to work with flysytem V3. As to v4 SS and v5.0 - you would have to setup your own caching as noted above whether filesystem, redis, memcached, etc.

What I am testing simply subclassing the FlysystemAssetStore class as noted above - https://github.com/silverstripe/silverstripe-s3/issues/48#issuecomment-929675809 - and then overwriting the YML config to use it. The class is below that just rewrites the two methods - applyToFileOnFilesystem and the exists function. If you look at the original in the SilverStripe-Assets repo - you will see that I have just returned true on the exists function instead of hashing, and removed the hashing from the applyToFileOnFilesystem function. The main reason. for overwriting this is that when the hasher actually hashes the file it has to stream it from S3. We know the file exists if the S3 client says it exists so there is really no need for the hashing. I am also trying to find out why SS is even hashing files to begin with.

    <?php

    namespace App\Assets\Flysystem;

    use SilverStripe\Assets\FilenameParsing\FileResolutionStrategy;
    use SilverStripe\Assets\FilenameParsing\ParsedFileID;
    use SilverStripe\Assets\Flysystem\Filesystem;
    use SilverStripe\Assets\Flysystem\FlysystemAssetStore as SSFlysystemAssetStore;
    use SilverStripe\Assets\Storage\FileHashingService;
    use SilverStripe\Core\Injector\Injector;

    class FlysystemAssetStore extends SSFlysystemAssetStore{
        protected function applyToFileOnFilesystem(callable $callable, ParsedFileID $parsedFileID, $strictHashCheck = true)
        {
            $publicSet = [
                $this->getPublicFilesystem(),
                $this->getPublicResolutionStrategy(),
                self::VISIBILITY_PUBLIC
            ];

            $protectedSet = [
                $this->getProtectedFilesystem(),
                $this->getProtectedResolutionStrategy(),
                self::VISIBILITY_PROTECTED
            ];

            /** @var FileHashingService $hasher */
            $hasher = Injector::inst()->get(FileHashingService::class);

            /** @var Filesystem $fs */
            /** @var FileResolutionStrategy $strategy */
            /** @var string $visibility */

            // First we try to search for exact file id string match
            foreach ([$publicSet, $protectedSet] as $set) {
                list($fs, $strategy, $visibility) = $set;

                // Get a FileID string based on the type of FileID
                $fileID =  $strategy->buildFileID($parsedFileID);

                if ($fs->has($fileID)) {

                    // We already have a ParsedFileID, we just need to set the matching file ID string
                    $closesureParsedFileID = $parsedFileID->setFileID($fileID);

                    $response = $callable(
                        $closesureParsedFileID,
                        $fs,
                        $strategy,
                        $visibility
                    );
                    if ($response !== false) {
                        return $response;
                    }
                }
            }

            // Let's fall back to using our FileResolution strategy to see if our FileID matches alternative formats
            foreach ([$publicSet, $protectedSet] as $set) {
                list($fs, $strategy, $visibility) = $set;

                $closesureParsedFileID = $strategy->searchForTuple($parsedFileID, $fs, $strictHashCheck);

                if ($closesureParsedFileID) {
                    $response = $callable($closesureParsedFileID, $fs, $strategy, $visibility);
                    if ($response !== false) {
                        return $response;
                    }
                }
            }

            return null;
        }

        public function exists($filename, $hash, $variant = null)
        {
            if (empty($filename) || empty($hash)) {
                return false;
            }

            // If `applyToFileOnFilesystem` calls our closure we'll know for sure that a file exists
            return $this->applyToFileOnFilesystem(
                function (ParsedFileID $parsedFileID, Filesystem $fs, FileResolutionStrategy $strategy) use ($hash) {
                    $parsedFileID = $strategy->stripVariant($parsedFileID);

                    if ($parsedFileID && $originalFileID = $parsedFileID->getFileID()) {
                        if ($fs->has($originalFileID)) {
                            return true;
                        }
                    }

                    return false;
                },
                new ParsedFileID($filename, $hash, $variant)
            ) ?: false;
        }
    }

Note this is editing the SS 5 version of the file but you could do something similar with v4 on your local project.

samandeggs commented 9 months ago

@obj63mc Okay fair, I'll take a look at doing the exact same thing on my version of things - as this was something I mentioned higher up that should be occurring (the concept of assuming that the files exist since it's on S3).

Edit: those functions were identical, sans the removal of the hash items - so if this works, and it isn't seen as necessary to hash, I suspect we could easily write a PR for the SS4 version of this, even if it's included as an option to skip_hash (or something) and make it official even for the SS4 version. Will test.

wilr commented 1 month ago

I've got a PR which covers some of this feedback (though not all of it) at https://github.com/silverstripe/silverstripe-s3/pull/70.

Would love everyone's feedback!