wagtail / wagtail-transfer

Content transfer for Wagtail
https://wagtail.github.io/wagtail-transfer/
BSD 3-Clause "New" or "Revised" License
91 stars 30 forks source link

Large transfers (especially involving media) can cause server-level timeouts #58

Open stevejalim opened 4 years ago

stevejalim commented 4 years ago

(I've discussed this loosely with Matt and Jacob in Slack, but writing it up here)

When a site is hosted on a platform which has a hard, non-configurable threshold for how long HTTP request can take (eg 30 seconds) a transfer that involves a sizeable video, or a number of other media files, can easily exceed this threshold. This kills the transfer, leaving pages rolled back, but third-party models (eg wagtailmedia) can be in an indeterminate state in terms of files on disks, somewhere.

The timeout happens because the overall WT import takes place over a single HTTP request, and transferring an asset file as part of the request-response cycle involves the time take to copy the file.

This problem is exacerbated when media files are stored in cloud storage, which is common for many PaaS setups.

eg:

Destination Server -> asks Source Server -> asks Source's Storage for file -> Source's Storage returns file to Source Server -> Source Server sends file to Destination Server -> Destination Server stores file in Destination's Storage.

So that's the same file being processed (read or written) 3 or 4 times, depending on upload spooling.

Possible solutions


(Separate from all the above, it would be nice to have a pre-flight check before a transfer to warn about large files that will be sent over)

stevejalim commented 3 years ago

I've done some work to do direct S3-to-S3 copying using a custom field adapter, which - while not yet in production - seems to be pretty reliable within some known constraints (eg only works for data with a public-read policy). If anyone's interested, the code is open source and I can point you at the relevant bits of the implementation.

If there's appetite for making this part of WT, @jacobtoppm, I'd be happy to do that when I have time.

easherma-truth commented 2 years ago

@stevejalim I'd be interested in taking a look if it still works and is up somewhere!

stevejalim commented 2 years ago

Hi @easherma-truth I've moved on from the org where I was using it but looks like the code is still there: https://github.com/uktrade/great-cms/blob/develop/core/wagtail_hooks.py#L135