tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.5k stars 102 forks source link

Discussion: Detection of parallel uploads - respond with "423 Locked" #115

Open KieranP opened 7 years ago

KieranP commented 7 years ago

Saw this issue happen today:

Currently, one of them will return "409 Conflict" as the spec instructs and thus halts, while the other is was able to continue uploading fine.

"409" is good when the client does something it shouldn't and thus acts as a safeguard. But could we provide a better http status code when the server detects that the file is being uploaded already? e.g. The http status code "423" is defined as Locked, which could be more appropriate in this case.

This would allow us to catch "423" errors, and handle them appropriately.

Thoughts?

ronag commented 7 years ago

User began uploading the same file again (but to a different object)

But isn't that a different upload? How is the server supposed to find out that it is the same file?

KieranP commented 7 years ago

@ronag No, because the fingerprint matches, and thus is uses the same resumable url that it locates from the local storage. So it ends up uploading the same file to the same resumable url, causing the 409 conflict.

When I said "to a different object", I meant object within our own platform. So the user tried uploading the same file at the same time but to two different places within the platform.

cjhenck commented 7 years ago

I'm confused too. So you are uploading the same file from the same client to the same PATCH URL? How does the server know it is a conflict at all? Is the metadata different?

Acconut commented 7 years ago

e.g. The http status code "423" is defined as Locked, which could be more appropriate in this case.

The tusd server implements locking which will return a 423 if multiple client try to access the same upload simultaneously. However, this is custom behavior and is not defined in the specification.

No, because the fingerprint matches, and thus is uses the same resumable url that it locates from the local storage. So it ends up uploading the same file to the same resumable url, causing the 409 conflict.

If you configure tus-js-client correctly using its retryDelays property, it will automatically catch those errors and start a new upload to a different upload URL.

I think the underlying question is what should actually happen in the scenario @KieranP described? What did you expect? Should the same file be uploaded to two different upload URLs? Should one wait until the other is finished since they assume they are uploading the same file?

KieranP commented 7 years ago

@cjhenck Yes, same file, same client, same PATCH URL because it pulls it from local storage. I imagine that

The tusd server implements locking which will return a 423 if multiple client try to access the same upload simultaneously. However, this is custom behavior and is not defined in the specification.

@Acconut Cool to hear that tusd already does this. It would be good to get this in as a plugin on the spec though so that other servers implement this behaviour.

I think the underlying question is what should actually happen in the scenario @KieranP described? What did you expect? Should the same file be uploaded to two different upload URLs? Should one wait until the other is finished since they assume they are uploading the same file?

I'd much rather catch a 423 error and report to the user that the file is already being uploaded. It makes no sense to me to have an identical copy of a 30GB file uploading to the same server. But this could always be a client setting if the server supports it, something like preventDuplicate: true

cjhenck commented 7 years ago

I think I was tired this week and failed to read the title of the issue, everything you've said makes complete sense. If a 423 is returned in the reference implementation then I would think that it would be good to define that as a recommendation.

ronag commented 7 years ago

The way we solve this issue is to actually allow parallel uploads under the assumption that both clients are uploading the same data and we don't check that upload-offset is actually at the current end.

Acconut commented 7 years ago

@KieranP: I'd much rather catch a 423 error and report to the user that the file is already being uploaded. It makes no sense to me to have an identical copy of a 30GB file uploading to the same server. But this could always be a client setting if the server supports it, something like preventDuplicate: true

I agree that should a setting for the client would make sense. I will look into how this can be achieved. Furthermore, this may be a good opportunity to add a recommendation for this to the specification. However, I am currently not sure whether this should be a short sentence in the core protocol or deserves its own extension as there are some more cases to consider:

The tusd server will also return 423 Locked when a HEAD request is issued while a PATCH request is running to the same upload URL. The reason is that the Upload-Offset header may not be correct anymore as the offset is only updated once, if the PATCH request completes but not in between. Personally, I am not sure if this is the best approach but for now it has served the purpose.

@ronag: The way we solve this issue is to actually allow parallel uploads under the assumption that both clients are uploading the same data and we don't check that upload-offset is actually at the current end.

Honestly, this seems very dangerous to me. If we added the locking mechanism as we talked about before, would you benefit from this additional security?

ronag commented 7 years ago

Honestly, this seems very dangerous to me.

I don't see how it is any less dangerous than appending to the file. Currently there is no guarantee that the data appended is from the same source. In what sense would you consider just overwriting more dangerous?

If we added the locking mechanism as we talked about before, would you benefit from this additional security?

Not really...

kaharman commented 6 years ago

I just want to add another case for status 423 Locked. I upload a file with web client, then in the middle of process the internet connection is broke up. Then I change with another connection method with expectation I can resume the upload, let say from wifi to LAN cable. What I got is error 423. I don't know how to handle this since I just upload single file in same browser, but with different connection. Do you have an idea @Acconut ?

Acconut commented 6 years ago

@kaharman The problem in your situation is that the server does not always know that the connection broke and therefore still waits for the first client to upload content. In this case, the upload is still locked if the second client attempts to upload. However, time will usually solve this problem. The server will notice after a timeout that the connection to the first client is broken and will unlock the upload again allowing the second client to continue the upload. In summary, you would just have to wait (usually a few seconds in my experience) before retrying.

mitar commented 6 months ago

Yes, same file, same client, same PATCH URL because it pulls it from local storage.

To me this looks like local storage caching in the browser is not properly implemented and it should be keyed based on your platform's object ID as well. So if user is uploading a file to object1 then local storage should cache the file only for object1 uploading. If user tries to upload a file again to object2, it should not go from local storage, thus it would be uploaded normally twice. And you would not have this problem. So this to me looks like a caching in local storage issue.