plone / volto

React-based frontend for the Plone Content Management System
https://demo.plone.org/
MIT License
429 stars 571 forks source link

PLIP: TUS reliable large uploads with simpler file upload UI in volto #5423

Open djay opened 6 months ago

djay commented 6 months ago

PLIP (Plone Improvement Proposal)

Abstract

This PLIP has three aims (and could be separated if desired)

Context

Motivation

Assumptions

Proposal & Implementation

TUS for contents upload

TUS for file fields/blocks

Replace Add > File

UI for "upload files"

TUS without temp folder

Deliverables

Additional work

Risks

OK/Cancel for "upload files"? ie finish upload/transactions when click save, not after picking.

Size of bundle with extra lib

File Widget

Participants

Context

Related optimisation on the backend to speed up the last part of a large upload

davisagli commented 4 months ago

@djay The linked images give 404s for me. Is it possible to share them as part of this issue?

At first glance, I like this direction. Thanks for all your notes.

I'd like to see the TUS backend in plone.restapi gain support for storing the temp files in the ZODB instead of raw files. That way it can also work when the ZODB is not using a shared filesystem blobstorage.

djay commented 4 months ago

@davisagli reuploaded the images.

I'd like to see the TUS backend in plone.restapi gain support for storing the temp files in the ZODB instead of raw files. That way it can also work when the ZODB is not using a shared filesystem blobstorage.

It's a good idea but I think it can be handled by a seperate PR or PLIP? It would double the storage needed until the DB is packed which is why it wasn't done this way originally but I think thats fine for a default unoptimised system.

davisagli commented 4 months ago

@djay I would say optimized for horizontal scalability rather than optimized for performance and minimum disk use, but fair point that there's a tradeoff there. And sure, it can be done separately from this PLIP. But it's related in that if the frontend can count on the backend to always support the TUS protocol, then it might simplify the frontend by not needing multiple file upload widgets.

djay commented 4 months ago

@davisagli good point. I'll put it in there

djay commented 1 month ago

@davisagli I'd have to check if the protocol supports it but if TUS allows a single chunk in a single api request then we could possibly not have to handle combining them on the backend for the default setup. So then we use the same UI and same client code and still sending via TUS endpoint but the space and transactions used on the backend are the same as without TUS. EDIT: yes it seems to support this - https://tus.io/protocols/resumable-upload#creation-with-upload

runyaga commented 2 weeks ago

Our usage:

Concerns:

Other:

djay commented 2 weeks ago

@runyaga TUS was in classic in plone 4. It seems to have got lost along the way - https://github.com/plone/plone.app.content/issues/64 but it would not be hard to bring back.

That implimentation required a temp folder that all instances had access to, ie a shared file system.

Do we want to the TUS implementation to be in Plone proper? I see the benefit, but would/could a first step be integrating the TUS reference implementation? This is a different set of trade offs but the upside is we dont have to maintain the server portion (at least initially).

There is no real trade off if TUS can be set single api call in cases where a shared file system is not available. Then it will not be any different from a normal upload other than being able to be retried and having progress in the UI. I agree with @davisagli that its better to have one implemenation that gets maintained rather than two. As you can see from history, niche code that is only enabled by a few gets forgotten and left out.

This PLIP talks about uploading files but I believe users are expecting to be able to upload directories and/or even nested folders/files.

yes it should possibly deal with that. I think from memory you need to use additional browser apis to get information about folders but I might be wrong. Certainly our implimentation didn't deal with recreating folder structures. Another issue that I think a folder has no mime type so I'm not sure how if there is a way in plone to say what the default content type is for a folder in the case of folderless plone like volto is.

Any concurrency of multiple uploads could potentially reek havoc with transactions

The old and new TUS implimentations upload chunks to a temp folder in a shared filesystem and don't do any transactions until it is finally combined. @davisagli suggested that if no shared filesystem was available then maybe it the chunks could be stored in the ZODB which would then have problems with lots of transactions. I'd argue its safer in such a case to go back to not using chunks and use a single api call upload. You have to reupload the whole thing again on disconnection but I think the transactions problem is worse. TUS can be configured to do parallel uploads but we didn't use it that way adn the old classic way didn't do this either. It wasn't slow.

Big wins are parallelize chunks of files. If we have to send one file at a time its going to be nightmarishly slow.

None of the implementations so far did this. If it was enabled then you'd have to ensure that the deployment config could set a maximum so you could ensure you don't overload your own site. But in theory you could enable this.

Prior conversations with @davisagli was that the Folder Contents changes to integrate something like this is fairly invasive. I would suggest this be worked out first. Drag n Drop into Folder Contents is the primary mode people expect to use the system.

The PLIP is to change the upload view and file add. It could be extended to so a drop on file contents activted the upload view.

People don't expect to "Add File" or even "Click to perform Bulk Upload".

I agree the current upload button is too well hidden. The current "Add file" is much more visible and I'd argue where begginers a led when they want to upload files and hense the reason to remove it and replace it with "Upload files" so that misguided user journey is fixed. But no harm in having DND in folder contents also.

We have some code for volto but it uses another design system. So if we get time we can try to put in a PR. We have to first

if anyone is motivated to work on that before we get to it we will happily share the branch we have.

runyaga commented 1 week ago

Do we want to the TUS implementation to be in Plone proper? I see the benefit, but would/could a first step be integrating the TUS reference implementation? This is a different set of trade offs but the upside is we dont have to maintain the server portion (at least initially).

There is no real trade off if TUS can be set single api call in cases where a shared file system is not available. Then it will not be any different from a normal upload other than being able to be retried and having progress in the UI. I agree with @davisagli that it's better to have one implementation that gets maintained rather than two. As you can see from history, niche code that is only enabled by a few gets forgotten and left out.

I do not see a direct answer to the proposition "Could a first step be integrating with the TUS reference impl?" I was thinking about requiring the reference TUS go container to be running alongside Plone. Then the integration really becomes "Use the TUS reference implementation" and we bundle a set of webhooks on the Plone side that know how to work with tusd which, to me, seems like an easier problem to solve.

Some huge positives:

My understanding:

@djay mind sharing me to your TUSD branch?

djay commented 1 week ago

@runyaga plone restapi supports TUS and has done for some time. https://plonerestapi.readthedocs.io/en/querysources/tusupload.html

The problem is not on the server side. The problem is that both classic and volto don't yet have client side clients.

"in-Plone" uploads participate in ZODB transactions and block at least 1 thread per concurrent file

This is not true. The rest-api current implimentation doesn't do any transactions until the chunks are all uploaded. Instead each request is put into a shared temp folder and only the last TUS upload request results in one big transaction.

There is not clarity on the UI/UX implementation; basically any place where a File widget can exist there is a desire to make it work with TUS toggled on/off (seems like a large ask?)

The first step is to have it working with the contents upload. Working out how to have it working with other file widgets is a harder subsequent task.

The desire is to not have TUS turned on or off but always use it but perhaps change how the transactions work depending on how its deployed since having a shared temp folder is not always possible.

We will share our branch but it's mixed in with a bunch of other stuff right now so we just have to find time to seperate out the volto UI code that just deals with TUS and progress.

djay commented 1 day ago

@davisagli @sneridagh if this was done with uploady library instead of js-tus-client do you think it would be accepted? We have a implementation using that which just needs some polishing and looking at what uploady provides, using js-tus-client might involve adding what uploady already has. Size wise it doesn't seem so much different. But all depends what you want in core.