jshttp / content-disposition

Create and parse HTTP Content-Disposition header
MIT License
224 stars 43 forks source link

Is it safe to use with aws-sdk? #25

Closed emilsedgh closed 5 years ago

emilsedgh commented 5 years ago

I'm using this great package to set the proper content disposition on files I upload to S3 using aws-sdk.

In some cases I get this error:

SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.

The file I'm trying to upload has a non-breaking-space in it.

Here's the relevant aws-sdk issue

dougwilson commented 5 years ago

Hi! I'm not familiar with that module, so can't really say for sure. This module tries to adhere to all specifications around the header field, though. Is there anything more specific perhaps I can answer? I skimmed though the linked issue but I don't see mention of this module and not really sure how you plan to fit them together and what, if any, errors happen when you do.

dougwilson commented 5 years ago

Sorry, I guess you do have an error posted. Is that when using this module? How does the signature calculation work? I guess reading it more, it sounds like the aws-sdk signature calculation is just broken, but it won't be fixed?

You can specify an argument to the fallback option that only contains ascii characters if the aws-sdk is broken as it reads like, or perhaps make a pull request to encode http headers correctly for signature calculation?

emilsedgh commented 5 years ago

Alright let me try this from another angle.

const cp = require('content-disposition');

cp(String.fromCharCode(80, 80, 160, 80))

cp('©')

None of these are translated into filename*. Instead, they are kept as filename.

Are they valid according to RFC 5987?

dougwilson commented 5 years ago

Yes, that example is 100% valid according to the specification.

The specification for the Content-Disposition header is https://tools.ietf.org/html/rfc6266

The grammar for the header is in section 4.1 (https://tools.ietf.org/html/rfc6266#section-4.1):

     content-disposition = "Content-Disposition" ":"
                            disposition-type *( ";" disposition-parm )

     disposition-type    = "inline" | "attachment" | disp-ext-type
                         ; case-insensitive
     disp-ext-type       = token

     disposition-parm    = filename-parm | disp-ext-parm

     filename-parm       = "filename" "=" value
                         | "filename*" "=" ext-value

     disp-ext-parm       = token "=" value
                         | ext-token "=" ext-value
     ext-token           = <the characters in token, followed by "*">

   Defined in [RFC2616]:

     token         = <token, defined in [RFC2616], Section 2.2>
     quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
     value         = <value, defined in [RFC2616], Section 3.6>
                   ; token | quoted-string

   Defined in [RFC5987]:

     ext-value   = <ext-value, defined in [RFC5987], Section 3.2>

The plain filename is defined above as filename-parm = "filename" "=" value and value is defined as

     token         = <token, defined in [RFC2616], Section 2.2>
     quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
     value         = <value, defined in [RFC2616], Section 3.6>
                   ; token | quoted-string

Note that the output of your example has put the filename inside double quotes, which means it is the quoted-string ABNF above.

Going to the referenced RFC (https://tools.ietf.org/html/rfc2616#section-2.2), here is the quoted-string definition:

       quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
       qdtext         = <any TEXT except <">>
       TEXT           = <any OCTET except CTLs,
                        but including LWS>

The above example String.fromCharCode(80, 80, 160, 80) encodes into the octets 50 50 a0 50. All of those octets are valid for the TEXT ABF grammar, as they are all OCTET (<any 8-bit sequence of data>) and none are CTL (<any US-ASCII control character (octets 0 - 31) and DEL (127)>).

Please correct me if I made a mistake.

emilsedgh commented 5 years ago

No you're right. Thank you so much for the great library and my apologies for the irrelevant and invalid issue.

dougwilson commented 5 years ago

Hi @emilsedgh it's no problem at all. This has me wondering if it might be useful if the module added a feature where you could do fallback: false, in which case it would not output the filename parameter at all and instead only output the filename* parameter.

The biggest downside to this is many (most?) parsers will only ever look at the filename parameter, which is why this module tries hard to always have one, if possible, currently. This may not be an issue with your use-case, however, though I'm not sure. From the thread in awk-sdk, it would seem like this may not be an issue for that particular service, though.

It would seem to me, though, that getting that error would indicate that the client (awk-sdk) is calculating the signature differently that the server. If this value is causing the trip up, it would seem likely that the server is handing the character correcrtly while the client is not, leading to the signature mismatch. It seems odd that the client not calculating the signature the same way the server is is not considered a bug in the client (??).

Just greping through the aws-sdk source code, it would seem the mistake is a disconnect between like https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L123 and https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L176 where the headers are concatenated into a JavaScript string (which is internally just UCS-2 encoded), but when Node.js is going to write that string out to the wire, it will do so as ISO-8859-1, following the specification, but the signature function in the SDK (which also needs to turn the internal JavaScript string into octets) encods using UTF-8 instead, causing the mismatch.

Node.js source where header string is encoded into octets for wire transfer: https://github.com/nodejs/node/blob/master/lib/_http_outgoing.js#L220-L233 (note the specification to the encoding that the headers are the latin1 encoding).

The location where ultimately that string in the sdk is turned into octets (in the form of a Node.js Buffer object) is https://github.com/aws/aws-sdk-js/blob/c175cb2b89576f01c08ebf39b232584e4fa2c0e0/lib/util.js#L415 at which time since the encoding is not specified, it is encoded as UTF-8.

I hope this helps!

emilsedgh commented 5 years ago

Hi Doug.

On aws-sdk-js#1648 the authors suggested the safest way to use that library is to send US-ASCII characters for filename and everything else on filename*.

So for now my solution was to strip my filename from any non-ascii character and provided that as a callback to content-disposition.

const fallback = name.replace(name, /[^\x00-\x7F]/ig, '?')

const ContentDisposition = contentDisposition(name, {
  type: 'inline',
  fallback
})

However, I think it's in everyone's best interest if aws-sdk would resolve the issue. Your detailed investigation would be valuable to the aws-sdk-js#1648.

emilsedgh commented 5 years ago

As for fallback: false, I agree with your comment. It may break many clients.

Do you find value in providing us-ascii as a fallback option?