Closed jacksonargo closed 4 years ago
It's that time again.
Video Upload Request: An active S3-compatible Multipart Upload. MUST be aborted by the server after an inactivity timeout. Only one per user at any given time. If another upload request is made, the user MUST choose whether they wish to abort the currently in-flight upload request.
Video Upload Request Metadata: The metadata associated with the Video Upload Request. This MUST get validated. This MAY include a token that MAY be required for confirmation of client-side cancellation.
GET /upload/video/request
Get the current video upload request.
Returns:
PUT /upload/video/request
Request a signed upload URL.
Data:
Returns:
201 Created
with the signed URL409 Conflict
with the current Video Upload Request Metadata.PUT /upload/video/abort
Abort the user's current video upload request. This MAY require passing a token associated with the current request.
201 Created
if the abort succeeds.Why only 1 per user? Or is this 1 per browser tab?
What does it mean to abort the request?
What does it mean to GET the current upload request?
Would be enough to have the client post to the server "hey I wanna upload this" and then "hey I'm done go check that it's there" ?
Why only 1 per user? Or is this 1 per browser tab? 1 per user, since hopefully the user's connection to Object Storage will get saturated by the one multipart.
What does it mean to abort the request? S3-compatible Multipart API term. https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortMultipartUpload.html
What does it mean to GET the current upload request? Upload requests have TBD metadata. Example: associated filename, user, part, project, etc. GET just returns a representation of this.
I think it would be enough to have the client post to the server "hey I wanna upload this" and then "hey I'm done go check that it's there"
I actually didn't think of the finalization part. I'll edit something into the next draft.
The GET might not be necessary, tbh.
1 per user
Ok that makes sense!
Abort
Ah gotchya, so vvgo.org would basically act as a gateway for the abort request, and S3 is the thing that takes action.
Ah gotchya, so vvgo.org would basically act as a gateway for the abort request, and S3 is the thing that takes action.
It's there because of the 1/user Semaphore: there needs to be an unlock mechanism. I guess another benefit of the Semaphore is that we don't have to keep track of Upload Request IDs, since there's only ever one per user at a given time.
Honestly, we probably could do without, but I think it's best to be safe. Your call on complexity, though.
Video Upload Request: An active S3-compatible Multipart Upload. MUST be aborted by the server after an inactivity timeout. Only one per user at any given time. If another upload request is made, the user MUST choose whether they wish to abort the currently in-flight upload request.
Video Upload Request Metadata: The metadata associated with the Video Upload Request. This MUST get validated. This MAY include a token that MAY be required for confirmation of client-side cancellation.
GET /upload/video/request
Get the current video upload request.
Returns:
PUT /upload/video/request
Request a signed upload URL.
Data:
Returns:
201 Created
with the signed URL409 Conflict
with the current Video Upload Request Metadata.POST /upload/video/abort
Abort the user's current video upload request. This MAY require passing a token associated with the current request.
Returns:
200 Ok
if the abort succeeds.POST /upload/video/finalize
Finalize the user's current video upload request. This should only be called after all parts are successfully uploaded and the client finalizes the upload on S3.
Returns:
200 Ok
if the Video Upload Request succeeded.404 Not Found
if the user does not have an associated Video Upload Request.I think you brought up a reasonable point on regarding uploads. I'm not really sure how that would look from two separate browser windows, but it doesn't matter too much to me. Implementation wise, we're gonna need a redis session anyways to correlate some cookie value to at least the DO abort request. So I don't think either way it will add more than 1 or 2 if blocks.
For the GET request, would the client be able to store this data locally? It could and probably should come in the response of the first POST to vvgo.org, and then we can store it in browser mem. Would there be a situation where browser could lose this info, but the upload would otherwise be resumable?
Suppose we don't get the finalize request or a abort request. Does S3 timeout of upload requests, or should we implement some timeout server-side?
For the GET request, would the client be able to store this data locally? It could and probably should come in the response of the first POST to vvgo.org, and then we can store it in browser mem. Would there be a situation where browser could lose this info, but the upload would otherwise be resumable? I guess we can include a SHA hash with the Video Upload Metadata, as well as the signed URL. Use WebCrypto to hash the blob, etc.
We don't really need to store it locally.
Suppose we don't get the finalize request or a abort request. Does S3 timeout of upload requests, or should we implement some timeout server-side?
We need to deal with it manually. The idiomatic way is through lifecycle, IIRC. Scheduled tasks, woo.
I guess we can include a SHA hash with the Video Upload Metadata, as well as the signed URL. Use WebCrypto to hash the blob, etc.
That seems extra complicated. I guess I was thinking something along the lines of the js app getting restarted, like page refresh or something, would it be able to resume in that scenario? I'm trying to understand why the GET request is necessary. (I don't have a problem with it, I'm just not sure how it would be used).
Lifecycle management for the upload request makes sense. We've already got the start/stop semantics. We'll just need a garbage collector to make sure everything is nice and tidy. The only benefit here will be in removing cruft from object storage... which only matters if we start hitting 500GB+, so I think we can de-prioritize that work.
Storing locally, I meant store in memory. Not as a file on disk or anything.
That seems extra complicated. I guess I was thinking something along the lines of the js app getting restarted, like page refresh or something, would it be able to resume in that scenario? I'm trying to understand why the GET request is necessary. (I don't have a problem with it, I'm just not sure how it would be used).
Hmm. I guess we can persist the URL plus a hash and a segment count locally in localstorage.
I'm not suggesting that we should support retries in that scenario. I guess I just don't understand lifecycle of local variables in javascript. It seems like after the first post request, you could just store the response in a local var and access that var as needed during the upload to S3. Is that not the case?
Oh, I was talking about the restart/resume case, not the happy path.
Yeah, the happy path has everything in a local var.
Ok cool that makes sense!
Regarding restarts/resumes. How would that work from the client side? :thinking: What info does the client need to upload the file anyways? Is path on disk enough or is there other stuff that's needed?
🤔 What info does the client need to upload the file anyways? Is path on disk enough or is there other stuff that's needed?
It'd be a blob/stream through the File API, so we'd keep track of which chunks were sent locally, as well as the ETag responses, and chunk SHAsums. We need them to finalize the S3 side anyway.
I'm reading through the api docs for multipart upload, and now I'm kinda understanding the limitations here. We definitely don't need any S3 client code. These are all pretty straight forward query params that we can add to the presigned url, and that really makes me happy.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html
:thinking: Do we want to store the etags in redis?
I feel like we dont need to. I think we can spool the responses from s3 and then just send one batch payload to vvgo.org/finalize. If the client app is interrupted or whatever, then it can just start from scratch.
Also, I want to generalize this api for sheet music and click tracks file upload as well, not just video uploads.
However, we only need to make the view for video submissions.
I think we're not going to do this/de-prioritize this in favor of notification webhooks from dropbox or google forms.
\OoOoOoOoOoO/
Yay it's time to make a video submissions page! We'll need to work with graphic design team to work out the front end.
The backend should:
"$scoreOrder $partName - $creditedName ($instrument).ext"
. We'll work with @Ranzha to decide the final format.I think for the upload part, we can extend the current uploads controller.