tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
2.95k stars 467 forks source link

handler: support range requests for GET /upload #1048

Closed markspolakovs closed 6 months ago

markspolakovs commented 6 months ago

Where the store supports it, use http.ServeContent to serve the contents of the upload, thereby adding support for range requests:

The main benefit of ServeContent over io.Copy is that it handles Range requests properly, sets the MIME type, and handles If-Match, If-Unmodified-Since, If-None-Match, If-Modified-Since, and If-Range requests.

This helps optimise downloading large files directly from tusd.

Acconut commented 6 months ago

Thank you for this PR! Do you know which stores return an io.ReadSeeker? It's probably only the file store.

markspolakovs commented 6 months ago

Do you know which stores return an io.ReadSeeker? It's probably only the file store.

From a cursory look that appears to be the case - though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests. For now though this would only work for the file store, the Range header would be ignored for all the others.

Acconut commented 6 months ago

though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests.

Yes, that would be possible, but likely not very efficient. From briefly looking through the code behind http.ServeContent, it's apparent that the function performs multiple seek operations to determine the file size, content type, and reading the requested range. Each of these seek operations would result in one range request to S3. So a single range request to tusd can trigger multiple requests to S3.

A better approach would be to pass the range request directly to the datastore, so it can fetch the requested range from the underlying storage provider. In tusd, we enabled storages to implement such optional features through additional interfaces. For example, a storage can implement the TerminaterDataStore to provide cancellable uploads:

https://github.com/tus/tusd/blob/e2784199a268bd03cdc188f376386fadc6735582/pkg/handler/datastore.go#L121-L132

We could define a similar interface for fetching ranges. For the s3store, the implementation would then send a range request to S3. For the filestore, it could just seek to the desired position and read from there. The handler can then also detect whether the storage supports range requests at all and set the Accept-Ranges header accordingly.

This approach would require more effort to get it working for your use case, but also allows this feature to be provided by other storages in the future. Maybe the interface could even be extended to also allow handling the If-* headers.

What do you think?

markspolakovs commented 6 months ago

I like that approach - it's a lot more effort, but it'd mean that we can extend the support to the other stores (I didn't realise that that's how ServeContent operates...).

Out of interest, I did some searching and found #432 where you were against implementing Range support in tusd - have you changed your mind on this? For context, my use case for adding this was to optimise another internal service downloading files from tusd - in the past when it failed it'd need to restart from the beginning. (I've worked around this by having this service read directly from minio, but it does mean I lose the flexibility of moving between backing stores and keeping tusd as a "generic" API).

In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.

Acconut commented 6 months ago

Out of interest, I did some searching and found #432 where you were against implementing Range support in tusd - have you changed your mind on this?

Yes, I did change my opinion here. We have recently been involved in the HTTP working group with the goal of making a IETF standard for resumable uploads. In the end, we hope that resumable uploads are one component in the HTTP toolbox that interacts well with the existing HTTP mechanism. Since then I am more interested in making sure that tusd works better with concepts, such as range requests.

In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.

Great idea, feel free to open an issue if you are still interested in working on this :)