tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
2.92k stars 465 forks source link

Unclear if filelocker has to be used with filestore #1110

Closed mitar closed 2 months ago

mitar commented 2 months ago

Question

So I am reading embedding documentation and there is filestore package which states for its New function:

In addition, a locking mechanism is provided.

But looking at its implementation I do not see any locking?

So should filelocker (or even memorylocker) be used together with filestore or not?

Setup details

I am trying to use tusd embedded in my own Go server.

Acconut commented 2 months ago

Yes, you should always use a locker. Please read https://tus.github.io/tusd/advanced-topics/locks/ for more details.

The locker implementation has been pulled out of filestore into its own package, filelocker, and I will adjust the comments. Thanks for letting us know. We should also consider making the locker interface mandatory in tusd, so that people do not leave it out accidentally.

mitar commented 2 months ago

Oh, I missed that section completely because I went straight for the "embedding in Go program" section. Yes, that section nicely explains it.

We should also consider making the locker interface mandatory in tusd, so that people do not leave it out accidentally.

Yea.

On the other hand, there could also be a different protocol design, where each "upload" would send to a temporary file ID and only at the end it would be renamed to target name. So similar to how you support uploading in parallel to different files and then combining, we could use the same design always, and then there is no lock necessary, every client just starts uploading its chunk. You only have to store metadata about which chunk it is so that you can then combine it into full file. There might be duplicate chunks uploaded but that is fine, extra data just gets discarded when combining.

Acconut commented 2 months ago

Oh, I missed that section completely because I went straight for the "embedding in Go program" section. Yes, that section nicely explains it.

I have push a few commits to update the documentation and examples in a few places to hopefully make the relevance of lockers easier to understand for new users. Feel free to check them out and let me know if we could do better. The commits are linked above my comment.

We will also discuss making the locker mandatory in future releases. Using tusd programmtically without a locker would then produce a meaningful error. See https://github.com/tus/tusd/issues/1119

On the other hand, there could also be a different protocol design, where each "upload" would send to a temporary file ID and only at the end it would be renamed to target name. So similar to how you support uploading in parallel to different files and then combining, we could use the same design always, and then there is no lock necessary, every client just starts uploading its chunk. You only have to store metadata about which chunk it is so that you can then combine it into full file. There might be duplicate chunks uploaded but that is fine, extra data just gets discarded when combining.

Yes, that would be a different approach with its own advantages and drawbacks. It sounds similar to how S3 handles multipart uploads and works well in same applications. But that is not the design we have with tus :)

Acconut commented 2 months ago

By the way, I will close this issue for now since I feel that your original question was answered. Let me know if that's not the case and I will reopen this. If you have other topics, feel free to open a new issue.

mitar commented 2 months ago

Awesome, thanks.

Yes, that would be a different approach with its own advantages and drawbacks.

Do you have some reference or something for the drawbacks of such a approach? Just out of curiosity. (Especially given that you do support this for parallel uploading.)

Acconut commented 2 months ago

One drawbacks of an S3-multipart-like upload is that you need an additional request to commit the upload notifying the client which parts to use in which order.

In addition, it would be harder for clients to resume an upload. With tus' approach, the client retrieves the offset and continues the upload. With a multipart-like upload, the client would have to retrieve a list of uploaded parts and then figure out how to resume from there.

These are not unsolvable issues, bit still hurdles.

mitar commented 2 months ago

Thanks!