virtual-vgo / vvgo

Virtual Video Game Orchestra
https://vvgo.org
Apache License 2.0
8 stars 7 forks source link

Video Submissions Page #155

Closed jacksonargo closed 4 years ago

jacksonargo commented 4 years ago

\OoOoOoOoOoO/

Yay it's time to make a video submissions page! We'll need to work with graphic design team to work out the front end.

The backend should:

I think for the upload part, we can extend the current uploads controller.

0az commented 4 years ago

It's that time again.

RFC: Video Upload API

Concepts

Video Upload Request: An active S3-compatible Multipart Upload. MUST be aborted by the server after an inactivity timeout. Only one per user at any given time. If another upload request is made, the user MUST choose whether they wish to abort the currently in-flight upload request.

Video Upload Request Metadata: The metadata associated with the Video Upload Request. This MUST get validated. This MAY include a token that MAY be required for confirmation of client-side cancellation.

GET /upload/video/request

Get the current video upload request.

Returns:

PUT /upload/video/request

Request a signed upload URL.

Data:

Returns:

PUT /upload/video/abort

Abort the user's current video upload request. This MAY require passing a token associated with the current request.

jacksonargo commented 4 years ago

Why only 1 per user? Or is this 1 per browser tab?

What does it mean to abort the request?

jacksonargo commented 4 years ago

What does it mean to GET the current upload request?

Would be enough to have the client post to the server "hey I wanna upload this" and then "hey I'm done go check that it's there" ?

0az commented 4 years ago

Why only 1 per user? Or is this 1 per browser tab? 1 per user, since hopefully the user's connection to Object Storage will get saturated by the one multipart.

What does it mean to abort the request? S3-compatible Multipart API term. https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortMultipartUpload.html

What does it mean to GET the current upload request? Upload requests have TBD metadata. Example: associated filename, user, part, project, etc. GET just returns a representation of this.

I think it would be enough to have the client post to the server "hey I wanna upload this" and then "hey I'm done go check that it's there"

I actually didn't think of the finalization part. I'll edit something into the next draft.

The GET might not be necessary, tbh.

jacksonargo commented 4 years ago

1 per user

Ok that makes sense!

Abort

Ah gotchya, so vvgo.org would basically act as a gateway for the abort request, and S3 is the thing that takes action.

0az commented 4 years ago

Ah gotchya, so vvgo.org would basically act as a gateway for the abort request, and S3 is the thing that takes action.

It's there because of the 1/user Semaphore: there needs to be an unlock mechanism. I guess another benefit of the Semaphore is that we don't have to keep track of Upload Request IDs, since there's only ever one per user at a given time.

Honestly, we probably could do without, but I think it's best to be safe. Your call on complexity, though.

0az commented 4 years ago

RFC: Video Upload API [Draft 2]

Concepts

Video Upload Request: An active S3-compatible Multipart Upload. MUST be aborted by the server after an inactivity timeout. Only one per user at any given time. If another upload request is made, the user MUST choose whether they wish to abort the currently in-flight upload request.

Video Upload Request Metadata: The metadata associated with the Video Upload Request. This MUST get validated. This MAY include a token that MAY be required for confirmation of client-side cancellation.

GET /upload/video/request

Get the current video upload request.

Returns:

PUT /upload/video/request

Request a signed upload URL.

Data:

Returns:

POST /upload/video/abort

Abort the user's current video upload request. This MAY require passing a token associated with the current request.

Returns:

POST /upload/video/finalize

Finalize the user's current video upload request. This should only be called after all parts are successfully uploaded and the client finalizes the upload on S3.

Returns:

jacksonargo commented 4 years ago

I think you brought up a reasonable point on regarding uploads. I'm not really sure how that would look from two separate browser windows, but it doesn't matter too much to me. Implementation wise, we're gonna need a redis session anyways to correlate some cookie value to at least the DO abort request. So I don't think either way it will add more than 1 or 2 if blocks.

For the GET request, would the client be able to store this data locally? It could and probably should come in the response of the first POST to vvgo.org, and then we can store it in browser mem. Would there be a situation where browser could lose this info, but the upload would otherwise be resumable?

Suppose we don't get the finalize request or a abort request. Does S3 timeout of upload requests, or should we implement some timeout server-side?

0az commented 4 years ago

For the GET request, would the client be able to store this data locally? It could and probably should come in the response of the first POST to vvgo.org, and then we can store it in browser mem. Would there be a situation where browser could lose this info, but the upload would otherwise be resumable? I guess we can include a SHA hash with the Video Upload Metadata, as well as the signed URL. Use WebCrypto to hash the blob, etc.

We don't really need to store it locally.

Suppose we don't get the finalize request or a abort request. Does S3 timeout of upload requests, or should we implement some timeout server-side?

We need to deal with it manually. The idiomatic way is through lifecycle, IIRC. Scheduled tasks, woo.

jacksonargo commented 4 years ago

I guess we can include a SHA hash with the Video Upload Metadata, as well as the signed URL. Use WebCrypto to hash the blob, etc.

That seems extra complicated. I guess I was thinking something along the lines of the js app getting restarted, like page refresh or something, would it be able to resume in that scenario? I'm trying to understand why the GET request is necessary. (I don't have a problem with it, I'm just not sure how it would be used).

Lifecycle management for the upload request makes sense. We've already got the start/stop semantics. We'll just need a garbage collector to make sure everything is nice and tidy. The only benefit here will be in removing cruft from object storage... which only matters if we start hitting 500GB+, so I think we can de-prioritize that work.

jacksonargo commented 4 years ago

Storing locally, I meant store in memory. Not as a file on disk or anything.

0az commented 4 years ago

That seems extra complicated. I guess I was thinking something along the lines of the js app getting restarted, like page refresh or something, would it be able to resume in that scenario? I'm trying to understand why the GET request is necessary. (I don't have a problem with it, I'm just not sure how it would be used).

Hmm. I guess we can persist the URL plus a hash and a segment count locally in localstorage.

jacksonargo commented 4 years ago

I'm not suggesting that we should support retries in that scenario. I guess I just don't understand lifecycle of local variables in javascript. It seems like after the first post request, you could just store the response in a local var and access that var as needed during the upload to S3. Is that not the case?

0az commented 4 years ago

Oh, I was talking about the restart/resume case, not the happy path.

Yeah, the happy path has everything in a local var.

jacksonargo commented 4 years ago

Ok cool that makes sense!

Regarding restarts/resumes. How would that work from the client side? :thinking: What info does the client need to upload the file anyways? Is path on disk enough or is there other stuff that's needed?

0az commented 4 years ago

🤔 What info does the client need to upload the file anyways? Is path on disk enough or is there other stuff that's needed?

It'd be a blob/stream through the File API, so we'd keep track of which chunks were sent locally, as well as the ETag responses, and chunk SHAsums. We need them to finalize the S3 side anyway.

jacksonargo commented 4 years ago

I'm reading through the api docs for multipart upload, and now I'm kinda understanding the limitations here. We definitely don't need any S3 client code. These are all pretty straight forward query params that we can add to the presigned url, and that really makes me happy.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html

:thinking: Do we want to store the etags in redis?

I feel like we dont need to. I think we can spool the responses from s3 and then just send one batch payload to vvgo.org/finalize. If the client app is interrupted or whatever, then it can just start from scratch.

jacksonargo commented 4 years ago

Also, I want to generalize this api for sheet music and click tracks file upload as well, not just video uploads.

jacksonargo commented 4 years ago

However, we only need to make the view for video submissions.

jacksonargo commented 4 years ago

I think we're not going to do this/de-prioritize this in favor of notification webhooks from dropbox or google forms.