Closed jrochkind closed 5 years ago
AWS S3 generally supports direct uploads via either a POST or a PUT request.
Generating parameters for a POST multipart/form-data
request supports the "constraints" you've mentioned, :content_length_range
, :content_type
etc. – see #presigned_post
and PresignedPost
for more details. I think this feature exists to allow uploads via an HTML form. BTW, this is what Shrine::Storage::S3#presign
will call by default.
require "aws-sdk-s3"
require "http"
resource = Aws::S3::Resource.new(...)
bucket = resource.bucket("my-bucket")
object = bucket.object("my-key")
data = object.presigned_post(
content_length_range: 0..4,
content_type: "image/jpeg",
)
form_data = data.fields.merge(file: HTTP::FormData::Part.new("big file", content_type: "image/jpeg"))
response = HTTP.post(data.url, form: form_data)
response.status.to_s # => "400 Bad Request"
response.body.to_s # =>
# <?xml version="1.0" encoding="UTF-8"?>
# <Error><Code>EntityTooLarge</Code><Message>Your proposed upload exceeds the maximum allowed size</Message><ProposedSize>8</ProposedSize><MaxSizeAllowed>4</MaxSizeAllowed><RequestId>D5924B5779B2B15D</RequestId><HostId>9PZU/HAkCIk5poeAri1AGDi7wliabT+IRrUhZZqSDAK8kWFKMHZXFB/aV6h8hx8VUWF/2jFT6/c=</HostId></Error>
form_data = data.fields.merge(file: HTTP::FormData::Part.new("file", content_type: "video/mp4"))
response = HTTP.post(data.url, form: form_data)
response.status.to_s # => "204 No Content"
object.get.content_type # => "image/jpeg"
However, in the example above I'm surprised that specifying a different content type succeeded. I tried with :content_type_starts_with
, but then I get a 403 Forbidden
for a correct content type 🤷♂️
Generating parameters for a PUT request supports similar "constraints", but they're more exact – they are the same parameters for #put
that's used for uploading. For example, you cannot specify a :content_length_range
, only an exact :content_length
. But it also supports additional parameters that #presigned_post
doesn't, such as :content_md5
for specifying checksums.
# ...
url = object.presigned_url(:put,
content_length: 4,
content_type: "image/jpeg",
)
response = HTTP
.headers("Content-Type" => "image/jpeg")
.put(url, body: "big file")
response.status.to_s # => "200 OK"
response = HTTP
.headers("Content-Type" => "video/mp4")
.put(url, body: "file")
response.status.to_s # => "403 Forbidden"
response.body.to_s # =>
# <?xml version="1.0" encoding="UTF-8"?>
# <Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
# ...
It looks like here we also actually get surprising behaviour. While upload with a different :content_type
was now properly rejected (with a generic "signature does not match" error), the :content_length
constraint didn't seem to work, because we could still upload a larger file successfully. So, I'm a bit confused 🤷♂️
Direct multipart upload requires presigned URLs for PUT uploads, and is what uppy-s3_multipart
uses. In other words, you cannot use presigned POST parameters for multipart uploads.
okay, thanks that's helpful background to orient me. So many moving parts, this gets so confusing! But I'll close this issue for now!
Hmm, now that I did though... So right, for multi-part upload, there are of course several PUT requests to S3.
I wonder if the uppy-s3_multipart app could/should take a maximum file size param, such that it refuses to generate put
requests for a file that is too large. Either up front, based on a content-length if it gets one... or just after already generating so many PUT requests for parts, on the N+1th request for a signed PUT, that would put it over the line... refuse to do it?
Or does it not really have the context to do even that, based on what the uppy front-end is giving it? It's all quite confuisng.
Just to add, I now realized that uppy-s3_multipart
generates parameters for #upload_part
(as opposed to #put
that I talked about), and it seems that #upload_part
supports a lot less parameters than #put
.
In any case, you can pass additional parameters to that operation like this:
Shrine.plugin :uppy_s3_multipart, options: {
prepare_upload_part: { ... },
}
Probably doesn't help much, though.
I wonder if the uppy-s3_multipart app could/should take a maximum file size param, such that it refuses to generate put requests for a file that is too large. Either up front, based on a content-length if it gets one...
It's not possible to constrain the filesize neither of the total upload nor of an individual part. uppy-s3_multipart
doesn't know the filesize of individual parts when completing the multipart upload, and #upload_part
presign doesn't accept any content length constraints.
or just after already generating so many PUT requests for parts, on the N+1th request for a signed PUT, that would put it over the line... refuse to do it?
uppy-s3_multipart
is a stateless API, so it doesn't know how many PUT requests were generated. It also doesn't know how large the chunks will be, so it wouldn't know how many PUT presigns is too many.
tus-ruby-server
is the only one that can constrain maximum filesize during upload.
The platform-independent AWS docs suggest that when pre-signing requests for direct S3 uploads, a "security policy" can be included. They even seem to suggest that a "security policy" must be included... but I can't figure out if or how this could be done using uppy-s3_multipart. Maybe this is inapplicable to how uppy or this backend is doing things, somehow?
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html#sigv4-post-signature-calc
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTConstructPolicy.html
Or maybe uppy is handling the security policy, somehow, rather than the back-end?
Theoretically it sounds like the security policy would allow the back-end to securely specify a max-size and allowable content-type, that if the upload doesn't match, S3 will not allow it to complete.
Giving the uppy JS a max-file-size in "restrictions" arg is something that happens totally on the front-end, so isn't reliable against a malicious front-end that would like to upload a larger file to S3 than you want to allow.
It seems like there should be some way to use this S3 signing feature to enforce a maximum file size on the back-end, in a way that can't be overridden by the front-end, even though the front-end is uploading directly to S3.
But I can't quite figure out how all the pieces go together. Any ideas?