Closed emilsedgh closed 5 years ago
Hi! I'm not familiar with that module, so can't really say for sure. This module tries to adhere to all specifications around the header field, though. Is there anything more specific perhaps I can answer? I skimmed though the linked issue but I don't see mention of this module and not really sure how you plan to fit them together and what, if any, errors happen when you do.
Sorry, I guess you do have an error posted. Is that when using this module? How does the signature calculation work? I guess reading it more, it sounds like the aws-sdk signature calculation is just broken, but it won't be fixed?
You can specify an argument to the fallback option that only contains ascii characters if the aws-sdk is broken as it reads like, or perhaps make a pull request to encode http headers correctly for signature calculation?
Alright let me try this from another angle.
const cp = require('content-disposition');
cp(String.fromCharCode(80, 80, 160, 80))
cp('©')
None of these are translated into filename*
. Instead, they are kept as filename
.
Are they valid according to RFC 5987?
Yes, that example is 100% valid according to the specification.
The specification for the Content-Disposition
header is https://tools.ietf.org/html/rfc6266
The grammar for the header is in section 4.1 (https://tools.ietf.org/html/rfc6266#section-4.1):
content-disposition = "Content-Disposition" ":"
disposition-type *( ";" disposition-parm )
disposition-type = "inline" | "attachment" | disp-ext-type
; case-insensitive
disp-ext-type = token
disposition-parm = filename-parm | disp-ext-parm
filename-parm = "filename" "=" value
| "filename*" "=" ext-value
disp-ext-parm = token "=" value
| ext-token "=" ext-value
ext-token = <the characters in token, followed by "*">
Defined in [RFC2616]:
token = <token, defined in [RFC2616], Section 2.2>
quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
value = <value, defined in [RFC2616], Section 3.6>
; token | quoted-string
Defined in [RFC5987]:
ext-value = <ext-value, defined in [RFC5987], Section 3.2>
The plain filename is defined above as filename-parm = "filename" "=" value
and value
is defined as
token = <token, defined in [RFC2616], Section 2.2>
quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
value = <value, defined in [RFC2616], Section 3.6>
; token | quoted-string
Note that the output of your example has put the filename inside double quotes, which means it is the quoted-string
ABNF above.
Going to the referenced RFC (https://tools.ietf.org/html/rfc2616#section-2.2), here is the quoted-string
definition:
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
TEXT = <any OCTET except CTLs,
but including LWS>
The above example String.fromCharCode(80, 80, 160, 80)
encodes into the octets 50 50 a0 50
. All of those octets are valid for the TEXT
ABF grammar, as they are all OCTET
(<any 8-bit sequence of data>
) and none are CTL
(<any US-ASCII control character (octets 0 - 31) and DEL (127)>
).
Please correct me if I made a mistake.
No you're right. Thank you so much for the great library and my apologies for the irrelevant and invalid issue.
Hi @emilsedgh it's no problem at all. This has me wondering if it might be useful if the module added a feature where you could do fallback: false
, in which case it would not output the filename
parameter at all and instead only output the filename*
parameter.
The biggest downside to this is many (most?) parsers will only ever look at the filename
parameter, which is why this module tries hard to always have one, if possible, currently. This may not be an issue with your use-case, however, though I'm not sure. From the thread in awk-sdk
, it would seem like this may not be an issue for that particular service, though.
It would seem to me, though, that getting that error would indicate that the client (awk-sdk
) is calculating the signature differently that the server. If this value is causing the trip up, it would seem likely that the server is handing the character correcrtly while the client is not, leading to the signature mismatch. It seems odd that the client not calculating the signature the same way the server is is not considered a bug in the client (??).
Just greping through the aws-sdk
source code, it would seem the mistake is a disconnect between like https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L123 and https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L176 where the headers are concatenated into a JavaScript string (which is internally just UCS-2 encoded), but when Node.js is going to write that string out to the wire, it will do so as ISO-8859-1, following the specification, but the signature function in the SDK (which also needs to turn the internal JavaScript string into octets) encods using UTF-8 instead, causing the mismatch.
Node.js source where header string is encoded into octets for wire transfer: https://github.com/nodejs/node/blob/master/lib/_http_outgoing.js#L220-L233 (note the specification to the encoding that the headers are the latin1
encoding).
The location where ultimately that string in the sdk is turned into octets (in the form of a Node.js Buffer
object) is https://github.com/aws/aws-sdk-js/blob/c175cb2b89576f01c08ebf39b232584e4fa2c0e0/lib/util.js#L415 at which time since the encoding is not specified, it is encoded as UTF-8.
I hope this helps!
Hi Doug.
On aws-sdk-js#1648 the authors suggested the safest way to use that library is to send US-ASCII characters for filename
and everything else on filename*
.
So for now my solution was to strip my filename from any non-ascii character and provided that as a callback to content-disposition.
const fallback = name.replace(name, /[^\x00-\x7F]/ig, '?')
const ContentDisposition = contentDisposition(name, {
type: 'inline',
fallback
})
However, I think it's in everyone's best interest if aws-sdk would resolve the issue. Your detailed investigation would be valuable to the aws-sdk-js#1648.
As for fallback: false
, I agree with your comment. It may break many clients.
Do you find value in providing us-ascii as a fallback option?
I'm using this great package to set the proper content disposition on files I upload to S3 using aws-sdk.
In some cases I get this error:
The file I'm trying to upload has a non-breaking-space in it.
Here's the relevant aws-sdk issue