Error "Requested data is before the reader's current offset" for Upload(stream) when resumption creates a new upload

nh2 commented 1 year ago

The error Requested data is before the reader's current offset from https://github.com/tus/tus-js-client/blob/01d3948c1a0b96adee545af72cdaabc5c5e979b1/lib/browser/sources/StreamSource.js#L38-L39

is triggered when the following conditions are met jointly:

new Upload(new ReadableStream(...)) is used
chunkSize is set and the upload progressed to the second or higher chunk
a transmission error occurs and resumption fails (e.g. because the server temporarily gives HTTP 502 Bad Gateway, or any of these designed situations occurs

(These conditions are true in common cases, e.g. when you're behind Cloudflare (requiring chunkSize) and you restart your TUS server software for a split second for an upgrade.)

Then, this code triggers: https://github.com/tus/tus-js-client/blob/01d3948c1a0b96adee545af72cdaabc5c5e979b1/lib/upload.js#L672-L675

which creates a completely new upload with offset 0, but it seems that the ReadableStream is not reset and can already be way further advanced.

Hence, Requested data is before the reader's current offset necessarily triggers.

Automatically creating a new upload in the middle of the transfer seems logically impossible when the Upload is sourced by a ReadableStream, as the user is given no opportunity to create a stream on which another .slice(0, ...) can succeed.

nh2 commented 1 year ago

This issue is made more likely to crop up by issue #579, as that increases the chances that "a transmission error occurs and resumption fails".

Acconut commented 1 year ago

Automatically creating a new upload in the middle of the transfer seems logically impossible when the Upload is sourced by a ReadableStream, as the user is given no opportunity to create a stream on which another .slice(0, ...) can succeed.

Yes, that is correct. There is currently no way how an upload could succeed from such kind of failures. The question is how can we improve this situation in tus-js-client?

a) Just raise a different error saying that we cannot resume in such a case b) Provide some API so that the user can supply a new stream (but I am unsure how often this is actually possible).

nh2 commented 1 year ago

a) Just raise a different error saying that we cannot resume in such a case

Yes, I think the following would make sense:

Change the error message to explain that Readers cannot be resumed once they progress past the first chunkSize bytes.
Document in api.md:
- In the Upload() constructor docs:
- Mention that the logic of // Try to create a new upload exists. It is not obvious that tus-js-client may try to create a new upload after streaming had already started before, so the user may not even be aware of this situation (if you think thoroughly about it after reading the TUS spec, it becomes clear that there probably is some logic like that e.g. to handle Upload expiration, but it should best be explicitly mentioned).
- Based on that, mention that this logic will fail once streaming has progressed past the first chunkSize bytes.
- Mention that the user should prefer consider using the fileReader option in many cases instead of having to use a Reader. fileReader allows arbitrary offset slice()ing (not only forward streaming), so it is fully resumable.

b) Provide some API so that the user can supply a new stream (but I am unsure how often this is actually possible).

Isn't this theoretically already possible, by providing an arbitrary object as the argument to the Upload() constructor, and then implementing the fileReader option so that it gets the data from somewhere else (e.g. when slice(start, ...) observes start decrease back to 0, the user's slice() implementation could open some new source from somwhere)? See also #583 which I just filed.

As a concrete example of how fileReader can solve more problems than passsing a Reader to Upload():

I found a bug in Chrome "FileReader 5x slower than in Firefox over Windows Share". It limits sequential File read throughput to 20 Mbit/s. A workaround is concurrent batched reading of multiple File.slice()s. I first implemented it with a Reader but then hit this problem here. Only then did I find out that I can implement it with the fileReader option to get both custom IO patterns AND full resumability from offset 0.

Acconut commented 1 year ago

Change the error message to explain that Readers cannot be resumed once they progress past the first chunkSize bytes.

I am wondering if we currently have the capabilities to detect this state inside the Upload class. I do not think so. If you have a file or buffer object, you can seek back to the beginning, so we cannot disallow it in all cases. And in the Upload class we do not directly have a property to detect streaming resources (yet). So maybe we need to expose this in the FileReader interface.

Isn't this theoretically already possible, by providing an arbitrary object as the argument to the Upload() constructor, and then implementing the fileReader option so that it gets the data from somewhere else (e.g. when slice(start, ...) observes start decrease back to 0, the user's slice() implementation could open some new source from somewhere)? See also #583 which I just filed.

Good point. That might be, although I never tried it. Plus I think this approach is a rather unintuitive way for most casual tus-js-client users. If we go and embrace a setup where a new stream can be supplied to tus-js-client, I would prefer it being done using a more explicit API.

tus / tus-js-client

Error "Requested data is before the reader's current offset" for Upload(stream) when resumption creates a new upload #580