Closed chillu closed 7 years ago
Could we simply not re-generate the thumbnail on upload, and re-use the in-memory canvas in the interim? I realise this doesn't solve the issue completely, but it does ensure that you don't end up with a massive spike of dynamic requests immediately on upload (but rather delays it until the next time the image is requested from the server).
I would look at this plus one of the above suggestions as a complete solution.
I don't think only solving "load spike on upload" warrants the additional complexity - most of the user impact will be in browsing through assets, rather than uploading large amounts of them (less frequent user action). Thumbnail generation makes the problem worse (longer load times on the GraphQL response), but if we avoid it on upload, it'll hit the user and server the next time the folder is loaded - just as bad IMHO.
With my suggestion in place, thumbnail generation would still be performed on upload, but the react code would not re-request the url on successful upload. I.e. In the presence of both a local canvas and a remote url, it would err in favour of retaining the local canvas cache.
Although I like opt 1, @chillu is this dangerous at all if the file contains sensitive information if it is left for say 10 sec published before becoming protected?
Side note: I have mentioned this before that for the files area it might be better to have the "save as draft" inside of the more options pop-up rather than the main action so that there is just the main action of publishing changes (this approach fits with auto save if we were to ever do it). Opt 1 would fit this approach to the UI nicely.
@chillu is this dangerous at all if the file contains sensitive information if it is left for say 10 sec published before becoming protected?
I would also have this concern; It would mean that there is no way to safely upload a sensitive file that is NEVER public.
I suggest still having the initial upload request generate the thumbnail, as this could be a memory intensive operation for certain files. Trying to resample multiple images in a single request would be the highest performance bottleneck.
Once all uploads are complete, you could assume that all thumbnails have been generated in the backend, and at that point you could potentially trigger thumbnails to be requested (option 3), or not until those assets are requested at a later time (my suggestion).
The "save as draft" button within the assets area is actually pretty useless in most cases as you are not normally viewing the file in its live context as presented on page(s) directly. The "Save" button does give you the option to go and check those pages before publishing but I doubt people would bother.
Option 1 defeats most of the value of asset staging/versioning. -100 points from me.
EDIT: However, it would be okay to add something to the upload UI that could let users optionally choose to "immediately publish these files so that they are publicly accessible"
We're conflating two different problems here:
<img>
HTTP requests overload the webserver with 50 somewhat concurrent dynamic requests. None of which will generate thumbnails, that's always done in GraphQL requests. So the 50 dynamic requests is the main issue I'm raising here. The problem with allowing users to opt-in or opt-out from "publish immediately" is that a user action might still make the website unavailable. If a browser fires up to 13 concurrent dynamic requests to a webserver, actual website page views outside of the CMS will get queued until after they're completed. Best case: Only 13 concurrent requests returning in 150ms each = additional 0.6s load time ((50/4)*150
). Worst case: 4 concurrent requests returning in 500ms each = additional 6.2s load time ((50/4)*500
).
The problem with allowing users to opt-in or opt-out from "publish immediately" is that a user action might still make the website unavailable. If a browser fires up to 13 concurrent dynamic requests to a webserver, actual website page views outside of the CMS will get queued until after they're completed. Best case: Only 13 concurrent requests returning in 150ms each = additional 0.6s load time ((50/4)150). Worst case: 4 concurrent requests returning in 500ms each = additional 6.2s load time ((50/4)500).
OK, best to ignore Option 1 and focus on Option 2-4 then.
Option 2 is really a special case of Option 4 and is probably the best road to go down; Option 3 feels like a bit of a band aid that still leaves other uses of unpublished assets exposed.
Option 4? I mean, it is needed one way or the other; was it not part of the 4.0 roadmap?
Option 4 (Speeding up bootstrap) will be a lot of work, with major areas like the Config API needing an overhaul, replacing extensions with traits, breaking a lot of APIs by more cleanly decoupling them (e.g. a standalone ORM). It's an ongoing effort, but from my understanding not a major focus of 4.x.
And frankly I wouldn't know how much we can chop out of the ProtectedFileController
logic, it uses pretty much the whole gamut of core functionality (YAML config, routing, controller stack, ORM, flysystem lib). Best case scenario we get it down from 200ms to 100ms. That still means you tie up close to a dozen web workers for a few seconds, each with ~50MB of PHP memory minimum. Depending on max workers and memory available in your environment, this might be anywhere from 25% to 100% of your web server capacity. So this needs to be a LOT faster/lightweight to be useful.
It probably needs to be decoupled from the ORM in some way. The system already sets marker values in the session to indicate which protected files the current session has access to; this should be done in a way that lets the fetching of the protected asset happen with a minimum of extraneous calls.
@tractorcow do you have any recommendations for how we could approach this?
A few thoughts:
/assets/*
could be wired to route to asset-handler.php
instead of main.php
with a shorter execution flow.asset-handler.php
in its entirety to reconfigure assets, which isn't ideal. If this is demonstrated as being a big time-saver we might want to investigate some kind of generate-code-from-Injector approach that essentially compiles a given service definition into a few lines of PHP that doesn't need to have Injector or config loaded in order to access it.has()
and isGranted()
. We should only perform an isGranted()
check and leave it at that.flysystem->read()
(FlysystemAssetStore line 857)I haven't done profiling of a protected asset request so that should be done first.
FYI Ingo it doesn't look like ProtectedAssetController makes use of the database.
The only thing this controller requires access to is session and config. That does mean we can avoid making a DB connection. We could setup a separate main.php which evades db::connect() and route protected assets through this.
As per your last point @sminnee , yes I want to do buffered responses for these files. :) I started working on this a few months ago, but we dropped the priority a bit.
OK, had a good session with Sam for the last hour - we managed to get his script working with protected files: https://gist.github.com/chillu/e8811357a2cd290d88408de3e34b18fb.
It replaces the Config
layer with a simpler version that hardcodes the required config keys for this particular execution path. It’s a purpose-built script, so those savings won’t translate to other SilverStripe executions, but it’s encouraging!
Looking at XHProf requesting a protected file (on PHP 5.6): Standard routing: 110ms, 12MB RAM test-file.php: 25ms, 3MB RAM
Next steps:
url
in the same way as main.php
File
class (and all related gunk), by bypassing or replacing the File::join_paths()
use in ProtectedFileController
Longer term steps (out of scope here):
Ideally Michael Strong's new config implementation would provide some classes that would implement the stub implementation I did in that gist.
It may be worth pushing that forward for an alpha5 goal
I've left some ideas on reducing reliance on ProtectedFileController within asset-admin at https://github.com/silverstripe/silverstripe-asset-admin/issues/358#issuecomment-286293854.
Comment from Damian on https://github.com/silverstripe/silverstripe-asset-admin/issues/358
I was thinking about thumbnails / image optimisation
I think we need an API that allows us to get existing manipulations (e.g. padded, resized) which ONLY get existing generated images.
At the moment, for example, $Image.Pad(100, 100)
gets a padded image, and if not regenerates it
We probably need $Image.ExistingOnly.Pad(100, 100)
which only gets the padded version if found. This is important, because when we are retrieving a list of 30 images per request, we cannot risk loading 30 images into memory and lazy-generating thumbnails. Of those 30 images, a set number of those images may not have thumbnails, and we have some other process of re-generating those thumbnails (outside of the initial pagination query) e.g. just some other endpoint
Secondly, for the request to 50 images, we should (for thumbnails that exist) ensure that protected image thumbnails are sent via base64 encoded blobs in the parent request, to avoid unnecessary requests to the protectedfilecontroller.
so there are four requests that change:
The goals of this are to:
Rough pseudocode for my ExistingOnly method.
public function ExistingOnly()
{
$dbfile = DBFile::create();
$dbfile->Filename = $this->Filename;
$dbFile->Hash = $this->Hash;
$dbFile->Variant = $this->Variant
$dbFile->canGenerate = false;
return $dbFile;
}
With a check in ImageManipulation::manipulate() on the canGenerate property.
Comment from Paul on https://github.com/silverstripe/silverstripe-asset-admin/issues/358
An observation more than an issue is that pagination only kicks in after the files area is reloaded. You can upload 1000 files and they will show all on one page, if you navigate away and back again they are limited on each page through pagination.
Copied and clarified from my linked comment at https://github.com/silverstripe/silverstripe-asset-admin/issues/358#issuecomment-286293854
We need an API that allows us to get existing manipulations (e.g. padded, resized) which ONLY get existing generated images. At the moment, for example, $Image.Pad(100, 100)
gets a padded image, and if not regenerates it
We need $Image.ExistingOnly.Pad(100, 100)
which only gets the padded version if found. This is important, because when we are retrieving a list of 30 images per request, we cannot risk loading 30 images into memory and lazy-generating thumbnails. Of those 30 images, a set number of those images may not have thumbnails, and we have some other process of re-generating those thumbnails (outside of the initial pagination query) e.g. just some other endpoint
Secondly, for the request to 50 images, we should (for thumbnails that exist) ensure that protected image thumbnails are sent via base64 encoded blobs in the parent request, to avoid unnecessary requests to the protectedfilecontroller.
so there are four requests that change:
data:image/jpeg;base64, XXXX
string.The goals of this are to:
Make the UX slightly more responsive
Acceptance Criterea:
Looking over the ACs in the description, and the additional/alternative ACs proposed by Damian in the comment above, I think we've covered this now.
We never return thumbnail information from the server to the client that is already available on the client.
Technically we're still returning thumbnail data from the server on reloads, it's just the first upload that keeps the data on the client (via the Upload JS lib). But I don't think that's a big deal, I don't want to get into custom cache management for data-urls and inline images (for protected assets)
Thumbnails for no more than one image are ever regenerated per request.
That's true in AssetAdmin now, but you can still overload a webserver by resizing a large amount of images in your own templates. That's no different from SilverStripe 3 through.
When an image has a missing thumbnail, a fallback handler is available that can lazy-generate a thumbnail for an image based on thumbnail and return it.
I've deleted thumbnails after they were auto-generated, and confirmed this is the case. It's not "lazy", but rather inlined in the GraphQL request. So that's an edge case which could still cause large PHP memory use and exec time within a GraphQL request. Asset folders upgraded from 3.x through MigrateFileTask
won't have those thumbnails, but will auto-publish all files. So the GraphQL call will generate a max of ~20 image thumbnails by default (based on pagination). I've written this up as a smaller, focused task: https://github.com/silverstripe/silverstripe-asset-admin/issues/626
@chillu I've re-opened this because we found it necessary to fix thumbnail performance. @flamerohr and I have implemented this.
Acceptance Criteria
PRs
Notes
Related
Background
WAS: Browsing unpublished assets can cause dozens of dynamic requests
In the new SS4 asset implementation, unpublished assets aren't directly accessible to the browser via URL. Instead, they're hidden away in an
assets/.protected/
folder, and routed through a SilverStripe request to ProtectedFileController. This controller checks if the file access has been granted based on previous calls to FlysystemAssetStore->getAsURL(). These grants are stored in the PHP session. Files might come from the local filesystem, or through network access (e.g. from AWS S3).So far, so good. But in reality, this means that anybody uploading a few dozen files will view a lot of protected files right after the upload. Our current pagination limit for
admin/assets
is 50, so that's at least 50 expensive dynamic requests in very quick succession, to show a single screen page of thumbnails. Browsers have up to 13 concurrent network connections per hostname. Not only is this significantly affecting the perceived loading performance ofadmin/assets
, but it's also likely to cause rate limitations or server process crashes. For uncached network requests (e.g. to AWS S3), every request will also stall for 100s of milliseconds - which will further deplete the pool of web workers available for the website. So if left as-is, a single CMS author will regularly bring down websites just by opening admin/assets.While we could lower the page size in
admin/assets
, this isn't addressing the root of the problem (20 expensive dynamic requests caused by a single user interaction are still too much).Option 1: Publish files immediately after upload (and make "protected files" opt-in) Option 2: Create a more lightweight version of
ProtectedFileController
outside of the SilverStripe bootstrap (hard because it relies on YAML config and lots of other fun things in SilverStripe core) Option 3: Intercept image requests and batch them into a single call Option 4: Speed up the SilverStripe bootstrap#troll
I've purposely filed this ticket in framework as opposed to asset-admin, since asset-admin just surfaces one variation of the problem. Viewing an unpublished gallery page or a page listing of bunch of documents in custom code has the same effect.
Somewhat related: It'll make it lot more painful to run a SilverStripe site through https://github.com/silverstripe/silverstripe-serve, since
php -S
is single threaded, single connection - so you're working off those 50 image requests sequentially. With ~300ms response times, that's 15s of total load time, plus the initial CMS response times. And consequently, it'll make it veeeery slow to run Behat tests this way, assuming Selenium waits forwindow.onload
rather thandocument.onload
There's a separate task for improving performance (memory usage) on streaming files: https://github.com/silverstripe/silverstripe-framework/issues/5462
Once this is solved, increase pagination size from 15 to 50 items in AssetAdmin (originally tracked under [https://github.com/silverstripe/silverstripe-asset-admin/issues/358]).
/cc @sminnee @tractorcow