owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.42k stars 183 forks source link

download of a large folder only starts after the archive has been created #10242

Open butonic opened 1 month ago

butonic commented 1 month ago

When downloading a folder using the archive download the web ui shows an activity visualization bar at the top after making the request to the server. Unfortunately, the download does not immediately start streaming but seems to hang until the archive has been fully assembled on the server side. This may cause a proxy in between to kill the connection. Furthermore, a user might be confused because nothing is happening (apart from the activity visualization bar at the top).

We should either start streaming immediately or show a notification that explains that the server is preparing the download ... but that should actually be async ... and a completely different API for downloads then what we currently have ... so ... we should just stream.

kulmann commented 1 month ago

I would also prefer it async... see https://github.com/owncloud/web/issues/10501#issuecomment-2357798447 (the issue contains quite some input for what you wrote down here as well) Would make people, including myself, very happy if we'd have a proper solution here which doesn't die, actually compresses the content and is async. Most of all, the harsh archiver limitations (number of files and total filesize) make it pretty much unusuable.

jvillafanez commented 1 month ago

Streaming seems a short term solution, assuming we can stream the archive right away. However, I don't think it's a good solution.

Let's say you want to archive a folder which contains 100 files spread into multiple folders. You start streaming the archive right away with no waiting time (as far as I know, at least for ".zip" and ".tar" files, it should be possible), however, an error happens while streaming the 47th file (the file is locked, random I/O error...). In that scenario, there is nothing we can do:


A "JobQueue" service might be a nice solution, and could also provide additional features that could be interesting to implement in the future.

The job queue is intended to be per user, and limited to 2-5 running jobs per user, with a maximum limit of maybe 50 running jobs (all parameters configurable). The API can contain common methods such as "create/queue job", "list jobs", "check job status/progress", "remove job". Web can provide a nice UI for all these methods so the user can control his own job queue.

As for this ticket, it could be solved by implementing an "archive" job that would archive the target folder and leave the result either in the same parent folder on in the requested one. An interaction example: right click in the folder -> choose jobs in the menu -> choose archive -> fill popup with the requested options -> done. Then he can check the "jobs" menu to check the state of the job and do other things meanwhile. When the job finishes, he can go to the target folder and download the archive file as any other regular download.

The good thing about this solution is that it can be extended for future use. We could implement on-demand virus scanning, AI image generation, on-demand thumbnail generation, massive auto-tagging based on content (which might require content analysis of the files)...

kulmann commented 1 month ago

Nice idea @jvillafanez ! We already discussed a kind of Workflow Engine in the past. Seems to go in the same direction. Little bit of context: https://github.com/owncloud/ocis/issues/7437

jvillafanez commented 1 month ago

Yes, but at the same time no. There are a couple of big differences:

We could merge both ideas by providing a system queue only accessible to admins, or plan the job queues to have permissions (probably just read and write permissions to see and add jobs in the queue) so the admins were the only ones that could check the system queue. These system queues (maybe just one, but there could be more) could have their own limitations, higher than the regular ones.

Note that with those changes, we'd need to track additional information, mainly for the system queue: who triggered the job, at what time the job was queued, at what time it started...

In any case, these are just ideas that will need research and planning, as well as proper scoping.