tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.48k stars 103 forks source link

Partial Checksum Extension #172

Open felix-schwarz opened 3 years ago

felix-schwarz commented 3 years ago

When using just the Checksum extension, the Server is required to discard the data of PATCH requests that ended prematurely (f.ex. because of a loss of connection), so that Clients need to re-send the entire chunk.

This extension allows Clients that use the Checksum extension to verify and confirm the integrity of chunks that were not received in full by the Server - thereby avoiding that data that was already transferred intact needs to be sent again.

Feedback welcome!

smatsson commented 3 years ago

I like this extension! To be honest I think this is what the original checksum extension should have looked like. Hindsight is 20/20 I guess :) Naming is hard and I found the name of the extension and headers a bit confusing. Will have to think about other names though as nothing comes to mind right now.

Acconut commented 3 years ago

First of all, thank you for this PR! I have currently a lot on my plate and am a bit overworked, so I don't know yet when I have time to look into this.

felix-schwarz commented 3 years ago

I like this extension! To be honest I think this is what the original checksum extension should have looked like. Hindsight is 20/20 I guess :) Naming is hard and I found the name of the extension and headers a bit confusing. Will have to think about other names though as nothing comes to mind right now.

Happy you like it! 😀

As with the last extensions I'm completely open to alternative name suggestions. As the extension adds kind of a commit/rollback mechanism, I originally thought of "Checksum Commit", but then found it hard to use the same headers for HEAD response and PATCH request, since only the latter would actually "commit" the partial data.

felix-schwarz commented 3 years ago

First of all, thank you for this PR! I have currently a lot on my plate and am a bit overworked, so I don't know yet when I have time to look into this.

Thanks for letting me know - and I hope you can find some time to rest and recharge soon.

smatsson commented 3 years ago

I like this extension! To be honest I think this is what the original checksum extension should have looked like. Hindsight is 20/20 I guess :) Naming is hard and I found the name of the extension and headers a bit confusing. Will have to think about other names though as nothing comes to mind right now.

Happy you like it! 😀

As with the last extensions I'm completely open to alternative name suggestions. As the extension adds kind of a commit/rollback mechanism, I originally thought of "Checksum Commit", but then found it hard to use the same headers for HEAD response and PATCH request, since only the latter would actually "commit" the partial data.

I've been thinking about the name for a bit and think that first of all we should switch the name around to have "checksum" first. This follows the pattern for other sub extensions (e.g. creation/creation-with-upload and concatenation/concatenation-unfinished). Not sure if "checksum-partial" is a good name though as it's not the checksum in itself that is partial but rather the data. I like the idea of "checksum-commit" but I feel like it might be a bit confusing.

Some suggestions (and why they might be OK):

What do you think @felix-schwarz ?

felix-schwarz commented 3 years ago

I've been thinking about the name for a bit and think that first of all we should switch the name around to have "checksum" first. This follows the pattern for other sub extensions (e.g. creation/creation-with-upload and concatenation/concatenation-unfinished).

Very good point!

Not sure if "checksum-partial" is a good name though as it's not the checksum in itself that is partial but rather the data. I like the idea of "checksum-commit" but I feel like it might be a bit confusing.

My biggest issue with checksum-commit was that "commit" only really works for the client to server request that "commits" the partial data, but is confusing when returned by the server in a HEAD response. And I didn't want to have two different header names whose content would be identical.

Some suggestions (and why they might be OK):

  • checksum-unfinished - There is data written that have not been finalized yet, since we cannot validate the checksum.
  • checksum-unverified - Same reason as above
  • checksum-partial-data - It's the checksum extension but with support for partial data on disconnects.

Thanks! I think checksum-unverified has the same issue as checksum-commit, just reversed: it really only works well for the HEAD response, but not for the request confirming it.

Another one I could think of is checksum-segment, because the checksum targets only a segment of the file.

Depending on choice, that'd result in these header names:

1) Upload-Checksum-Unfinished + Upload-Checksum-Unfinished-Range: summarizes the use case of this extension quite well, but not sure about the "Checksum Unfinished" name for the extension. 2) Upload-Checksum-Partial-Data + Upload-Checksum-Partial-Data-Range: nicely reflects the use of "partial data" in the proposed spec. Extension could be called "Checksum for Partial Data" or "Checksum with Partial Data". 3) Upload-Checksum-Segment + Upload-Checksum-Segment-Range: similar to "partial data", but in a single word. Extension name not so obvious, however. Maybe "Checksum for Segment".

I'm fine with all of them, but if I had to pick one, it'd likely be Partial Data as it works well for both header names and extension name(s).

What do you think @smatsson?

smatsson commented 3 years ago

Pardon the late reply! I've been swamped the last couple of weeks.

Naming is truly hard :D I think most of these are OK and just to make it more complicated I will also throw in checksum-resumable in the mix. I'm thinking that the original checksum extension is not resumable as the data will be discarded if the checksum mismatches (I found this quite confusing when adding support to tusdotnet, https://github.com/tus/tus-resumable-upload-protocol/issues/143).

I'm wondering if we have to call the extension the same name as the headers? Probably not? I mean we could call the extension "Checksum with partial data" (or "Checksum with segments" or whatever) while the headers are still called Upload-Checksum-Partial-Data-Range etc. Segment has a nice ring to it but might be confusing in regards to chunks (the words being similar, not the concept)? I agree with you that partial data is my favorite so far.

I've been meaning to write a POC on this extension (as with Client-Tag and Challenge) but just haven't gotten around to it.