Improve UX/Performance of image overview and chooser

wagtail / wagtail

A Django content management system focused on flexibility and user experience

https://wagtail.org

BSD 3-Clause "New" or "Revised" License

18.5k stars 3.92k forks source link

Improve UX/Performance of image overview and chooser #11330

Open robmoorman opened 1 year ago

robmoorman commented 1 year ago

Is your proposal related to a problem?

When content editors uploading large sets of images (especially high-res) Wagtail needs to create renditions. After upload, the overview (and choosers) are taking too much time by default as all renditions needs to be created. Often this causes 504 errors (timeouts) and making images unavailable.

Especially for lighter hardware these issues popup. We do like to keep hardware light, so speccing up the hardware especially for this flow feels a bit "overdone".

We can optimize this; by rendering the page and loading the images over the wagtailimages_serve endpoint.

Describe the solution you'd like

Use wagtailimages_serve (image_url) if available in urls.py

Describe alternatives you've considered

Speccing up the hardware
Post_save signal create renditions

Stormheg commented 1 year ago

bobperik commented 1 year ago

Yes please!

Stormheg commented 1 year ago

Like discussed in #929, this issue isn't specific to the image overview page and choosers. It also affects other pages that generate a lot of renditions. With the introduction of the {% picture %} tag in Wagtail 5.2 even more renditions are generated for an image. This makes it more likely for people to experience this issue.

The current situation where processing a request is delayed by the creation of missing renditions is far from ideal and has been a long standing issue in Wagtail: #929 was reported 9 years ago.

Creating renditions is blocking because we do not know what the url of the resulting image file is going to be, thus we can't render it in the template.

Proposing to generate a link to the wagtailimages_serve view instead when the rendition hasn't been created yet is an interesting idea, because we'd delay creating the rendition until it is actually requested by the visitors browser. I'd expect there will still be significant server load as the urls would be requested in quick succession. But even if some images time out due to overloaded hardware, the overview page itself should still render but with a couple of broken images, breaking down in a way that is less bad than the current situation. I'd consider that an improvement.

The main challenge I see with this solution is caching. It would be inefficient if the wagtailimages_serve url was used all the time. If the initial page render uses the wagtailimages_serve url and this response is cached, all subsequent requests of the page would use the wagtailimages_serve url too, even though a rendition is now available.

ababic commented 1 year ago

@Stormheg just to clarify, the introduction of the picture tag doesn't necessarily mean that more images are being generated now than ever. Developers previously had their own ways of achieving such things. The difference is: the new picture tag achieves this with a single read operation for each image, and utilises a threadpool to generate renditions simultaneously - making it more efficient and faster than ever before.

Stormheg commented 1 year ago

Thank you for that clarification @ababic 👍

robmoorman commented 1 year ago

@Stormheg Less bad is a good conclusion; server load will be probably the same for succession of the page load. However a page shown with timeouts and images not shown doesn't block the interface.

I compare the situation with HA ecommerce; were you also don't want to block the checkout due some javascript errors 😛 .

ababic commented 12 months ago

Sorry, but I feel the suggested solution here doesn't make a lot of sense. When you think of the effort required for the server to respond to X number of additional requests for individual renditions:

Duplicate database queries
Duplicate file read operations
The general overhead of each request passing through Django's request/response cycle

Server overloads would surely be more likely under these conditions, rather than less? Plus: What's the real cost of continual server failures/restarts?

If the aim is to help Wagtail run more successfully on modest infrastructure, there are a number of measures that can be taken:

Limiting the size of original images (whether user uploaded or automatically imported)
Keeping important pages free of unnecessary images
Promoting/encouraging use of previews amongst editors
Pushing newly published page URLs to a redis queue and having a management command request them at a rate which the infrastructure is comfortable with

robmoorman commented 12 months ago

I think the overhead per request/image is worth it in order to do not block the flow of a content editor. Also the suggestion the reduce original images conflicts with the vision we have to put the content editor centralized. We do want high-res images right :).

A basic wagtail implementation hosted in a light/medium environment (512-2048 memory?) and uploading a couple (> 5 images) of high-res (6000 px?) would introduce these issues already.

From architect perspective I would like to optimize this without doing to much with async stuff / infrastructure as the most use-cases are in this spectrum (no super heavy specced hardware).

Images are getting larger and larger nowadays; also the requirements for webp/avif/etc. Genering renditions is getting more important.

Placing myself in the role of a content editor I would rather see loaders (maybe timeouts... but the UX should not show that) in a image placeholder in the overview; indication "it's in progress". Rather than giving me a whole timeout of the complete page.

Rendering a lot of renditions in a single request feels not the right path here IMO.

We see the same actually for previews after putting a page together with richt content (images). For these cases we also use the image_url so that process is not being blocked for the content editor. It improves productivity for them.

Curious about your opinion on these findings @ababic ; eager to find a way we can help the content editors here.

Stormheg commented 12 months ago

I don't agree with Andy's statement.

I'm not saying the suggested approach is the ideal solution to this problem. Far from it actually. Like I mentioned, the server will still be overloaded from the additional requests for renditions. And yes, overall it will be less efficient to generate a bunch of renditions via separate requests. From a technical perspective this isn't a 'proper' solution to the problem.

What we are looking for is an improvement to the UX, mainly for content editors. A chooser / overview page with a small number of images failing to load is an improvement over a timeout error after waiting a couple moments for the page to load.

Limiting the size of uploaded images and/or reducing the number of images used isn't a fair tradeoff considering this issue only happens once (when the renditions aren't available).

The 'warming caches' approach that submits pages to a queue for an initial render pass is a potential workaround but requires extra infrastructure for the queue and worker. Not every project needs / can afford workers.

I do believe Wagtail could (and should) do better on modest infrastructure out of the box. I'll ponder over this problem some more.

ababic commented 12 months ago

@robmoorman @Stormheg

I think the overhead per request/image is worth it in order to do not block the flow of a content editor.

I'm not in disagreement with improving things for the content editor - simply that we need to think harder about a solution here. Adding something to Wagtail that has the potential to considerably ramp-up server load more broadly simply isn't something I think the core team will be keen on. I see the likely solution taking the form of multiple different measures, with wider benefits.

Also the suggestion the reduce original images conflicts with the vision we have to put the content editor centralized.

NOTE: This can be done in such a way that the editor isn't even aware. It doesn't necessarily need to make their lives any more difficult.

A basic wagtail implementation hosted in a light/medium environment (512-2048 memory?) and uploading a couple (> 5 images) of high-res (6000 px?) would introduce these issues already.

Are we talking server timeouts here or straight-up crashes? Something with 2045MB memory certainly shouldn't be choking on a couple of images, and extending the server timeout time could well help with timeouts.

The 'warming caches' approach that submits pages to a queue for an initial render pass is a potential workaround but requires extra infrastructure for the queue and worker. Not every project needs / can afford workers.

Without a worker, I would be even more worried about the increase in server load. How do you reliably run management commands/cron jobs if the server is crashing regularly?

gasman commented 12 months ago

To summarise, it sounds like what's being proposed is for some or all views within the admin to adopt a variant of the {% image %} tag mechanism for generating thumbnails, where instead of following the steps:

does a rendition already exist?
- if so, return its URL
- if not, generate the image file, create the rendition record, and return that URL

we do this:

does a rendition already exist?
- if so, return its URL
- if not, return the wagtailimages_serve URL for that image and rendition parameters

(side note: wagtailimages_serve is currently an optional feature, so we'd need to make it a requirement or use a separate endpoint designated for admin internal use only)

where wagtailimages_serve will go on to do this:

does a rendition already exist?
- if so, fetch and serve the image
- if not, generate the image file, create the rendition record, and serve the image

This will mean that any long-running image processing won't block the page load as a whole - it will result in a bunch of slow-loading images instead.

I'd discourage using this pattern on site front-ends (and it very definitely shouldn't become the standard behaviour of {% image %}) due to the concern @Stormheg raised about caching - more broadly, serving static files at scale is a hard problem that people have put lots of work and infrastructure into, and we shouldn't be doing an end-run around that infrastructure by serving them through Django instead.

I think it's fair game to do it within the admin, though. The admin backend shouldn't be behind a cache (if it is, you'll have bigger problems to solve), and as long as that's the case, the wagtailimages_serve route will only be used for the initial page load immediately after uploading the image. The only extra work incurred by Django is serving the file itself, and if that's a big deal we could even send a redirect to the newly-created static file instead.

As a refinement, maybe we could only take the wagtailimages_serve path if the original source image is above a certain threshold, e.g. 1000x1000 - on the basis that smaller images can be dealt with on-the-spot without much time penalty on the original request.

Stormheg commented 12 months ago

In reply to @ababic

A basic wagtail implementation hosted in a light/medium environment (512-2048 memory?) and uploading a couple (> 5 images) of high-res (6000 px?) would introduce these issues already.

Are we talking server timeouts here or straight-up crashes? Something with 2045MB memory certainly shouldn't be choking on a couple of images, and extending the server timeout time could well help with timeouts.

(Clarification: I work with @robmoorman)

In our specific setup, the application is behind a proxy that cancels requests after waiting for 30 seconds for a reply from Django and returns a timeout response to the user. No server crashes are actually happening, but for our content editors this feels like a crash anyway.

The 'warming caches' approach that submits pages to a queue for an initial render pass is a potential workaround but requires extra infrastructure for the queue and worker. Not every project needs / can afford workers.

Without a worker, I would be even more worried about the increase in server load. How do you reliably run management commands/cron jobs if the server is crashing regularly?

I'm not sure what leads you to believe that we are having issues running management commands. Our server resources aren't stretched that thin.

This issue surrounding server resources only arises when a bunch of high-quality images are uploaded at once. This does not happen that often, but often enough for us to make a fuss about it here 😅

In reply to @gasman

does a rendition already exist?

if so, return its URL

if not, return the wagtailimages_serve URL for that image and rendition parameters

Yes, this is what is being described here as a solution. We understand each other 👍 😄

I'd discourage using this pattern on site front-ends (and it very definitely shouldn't become the standard behaviour of {% image %}) due to the concern @Stormheg raised about caching - more broadly, serving static files at scale is a hard problem that people have put lots of work and infrastructure into, and we shouldn't be doing an end-run around that infrastructure by serving them through Django instead.

The proposed solution definitely opens a huge can of worms when applied without consideration. Caching is a hard problem and you are right this may not be the right solution for many. For example, for larger sites, I would go for the 'warming caches' approach as described here to avoid the can of worms. But this just isn't feasible when your clients are Dutch and on a budget 😛 (regardless of budget, we'd still like to provide our clients with a good experience)

I'd settle for only applying the proposed solution to the Wagtail admin – we can create our own template tag to handle images for the frontend and accept responsibility for any (caching) issues that may arise. If we find a better way to handle this we'll be sure to report back.

ababic commented 12 months ago

@Stormheg

I'm not sure what leads you to believe that we are having issues running management commands. Our server resources aren't stretched that thin.

Sorry - some confusion on my part. I believed we were talking about the servers crashing, rather than it being a simple timeout issue.