owncloud / ios-sdk

📱 📦 iOS SDK for ownCloud
GNU General Public License v3.0
10 stars 8 forks source link

TUS upload support (WIP) #65

Closed felix-schwarz closed 1 year ago

felix-schwarz commented 4 years ago

The SDK should provide support for uploading files via the TUS protocol.

Notable observations from reading the spec:

1. Background NSURLSession friendly

Requests / PATCH … The Server SHOULD always attempt to store as much of the received data as possible.

If the server does store as much of the received data as possible, the SDK has an easier time to comply with requirements for NSURLSession background queues and avoiding penalties:

2. Schedulable

Protocol Extensions / Creation … The Client and the Server SHOULD implement the upload creation extension.

It would be preferable if clients were able to directly start an upload with a single request:

3. Defined expiration

Protocol Extensions / Upload-Expires … The `Upload-Expires response header indicates the time after which the unfinished upload expires. A Server MAY wish to remove incomplete uploads after a given period of time to prevent abandoned uploads from taking up extra storage. The Client SHOULD use this header to determine if an upload is still valid before attempting to resume the upload.

This helps in determining whether an upload should be continued or not - and resume only uploads that are known to still be around. On the other hand, a HEAD request would be required anyway before resuming an upload, at which point an expired upload should also become apparent.

If the expiration date should be supported and utilized, though, adding support for expiration directly to the OCHTTP system should be considered, with requests being terminated with a new error code in case they have expired before having been scheduled.

4. Checksum troubles

The Checksum extension needs checksums to be provided on a per-request basis, calculated not over the entire file but over the body of the respective upload requests.

This is generally fine, but does not cover the scenario where an upload is interrupted and the server should use the already received bytes:

Possible solutions:

1. Store checksums, check when upload is complete

The specification provides this hint:

Once the entire request has been received, the Server MUST verify the uploaded chunk against the provided checksum using the specified algorithm.

The solution therefore could be to store all checksums and only verify them against the respective parts once they have been received in full.

Drawback:

2. Custom header with the full file checksum

A custom header with the full file checksum (f.ex. OC-Full-Upload-Checksum) is passed to the Create With Upload extension when the upload is initiated. That would allow verification of the full file once the upload has completed.

Drawback:

3. Custom header with the checksum over already transferred data

An additional, custom header with the checksum of the file up to the point the upload resumes from (f.ex. OC-Transmitted-Upload-Checksum) would allow the server to check if the data received before is consistent - and allow the server to cancel an already.

Drawback:

Drawback mitigation:

Pragmatic and performant

A pragmatic and performance-oriented approach would likely by a combination of 1. and 3.

Related issues

Known issues

The current implementation in develop has the following known issues:

butonic commented 4 years ago

1. core protocol

2. creation-with-upload extension

We can implement this on several endpoints:

To directly start an upload at the dataprovider clients would have to send the POST to the datagateway at /data because it needs to look up which dataprovider is actually responsible for the upload.

To implement the create extension we already do the necessary stat and InitiateFileUpload requests in ocdav and itr should be possiple to support the create-with-uploadextension as well.

The js tus client tries to send data with the initial request and falls back to just create if the server does not support it. So we should implement it asap. This is an easy win and halves thke number of requests.

3. expiration extension

should be implemented, but we need to agree on default expiry ... 24h?

4. checksum extension

The checksum extension only really makes sense when using multiple PATCH requests or when implementing and using the concatenation as well, because a failed PATCH request MUST discard the received bytes for the request. We can either implement Add Tus-Min/Max-Chunk-Size headers whis is not yet official but absolutely makes sense, because it would allow the admin to configure a sena chunk size for his deployment. Clients could then send the chunks using PATCH requests with a checksum per request. To consume all of the bandwith the concatenation extension can be used to allow uploading the chunks in parallel.

To mimic OC10 checksumming, I allow providing a checksum in the metadata that can be checked on the server side if the full file is available.

I prefer using Tus-Min/Max-Chunk-Size headers: it allows resuming and can be tuned by the admin as necessary. No urgent need to go parallel & concatenation right now ... someday, maybe.

If we went to invent a new checksumming extension we should create that as a PR in https://github.com/tus/tus-resumable-upload-protocol.

michaelstingl commented 4 years ago

3. expiration extension

should be implemented, but we need to agree on default expiry ... 24h?

Okay. Same as previous chunking implementations.

michaelstingl commented 4 years ago

post with metadata currently uses metadata to send the checksum: https://github.com/cs3org/reva/pull/674/files#diff-5fec0456a6ea9fb1227335fc8d3f8cfdR150

felix-schwarz commented 4 years ago

TUS support development has moved to develop #61