Closed markspolakovs closed 6 months ago
Thank you for this PR! Do you know which stores return an io.ReadSeeker
? It's probably only the file store.
Do you know which stores return an io.ReadSeeker? It's probably only the file store.
From a cursory look that appears to be the case - though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests. For now though this would only work for the file store, the Range header would be ignored for all the others.
though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests.
Yes, that would be possible, but likely not very efficient. From briefly looking through the code behind http.ServeContent
, it's apparent that the function performs multiple seek operations to determine the file size, content type, and reading the requested range. Each of these seek operations would result in one range request to S3. So a single range request to tusd can trigger multiple requests to S3.
A better approach would be to pass the range request directly to the datastore, so it can fetch the requested range from the underlying storage provider. In tusd, we enabled storages to implement such optional features through additional interfaces. For example, a storage can implement the TerminaterDataStore
to provide cancellable uploads:
We could define a similar interface for fetching ranges. For the s3store, the implementation would then send a range request to S3. For the filestore, it could just seek to the desired position and read from there. The handler can then also detect whether the storage supports range requests at all and set the Accept-Ranges
header accordingly.
This approach would require more effort to get it working for your use case, but also allows this feature to be provided by other storages in the future. Maybe the interface could even be extended to also allow handling the If-*
headers.
What do you think?
I like that approach - it's a lot more effort, but it'd mean that we can extend the support to the other stores (I didn't realise that that's how ServeContent operates...).
Out of interest, I did some searching and found #432 where you were against implementing Range
support in tusd - have you changed your mind on this? For context, my use case for adding this was to optimise another internal service downloading files from tusd - in the past when it failed it'd need to restart from the beginning. (I've worked around this by having this service read directly from minio, but it does mean I lose the flexibility of moving between backing stores and keeping tusd as a "generic" API).
In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.
Out of interest, I did some searching and found #432 where you were against implementing
Range
support in tusd - have you changed your mind on this?
Yes, I did change my opinion here. We have recently been involved in the HTTP working group with the goal of making a IETF standard for resumable uploads. In the end, we hope that resumable uploads are one component in the HTTP toolbox that interacts well with the existing HTTP mechanism. Since then I am more interested in making sure that tusd works better with concepts, such as range requests.
In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.
Great idea, feel free to open an issue if you are still interested in working on this :)
Where the store supports it, use
http.ServeContent
to serve the contents of the upload, thereby adding support for range requests:This helps optimise downloading large files directly from tusd.