Closed dxdc closed 7 years ago
Hi! Can you please describe in commit message (and also here on github) which UTF-8 issue it fixes and how? Because from this pull request it looks like it fixes some hash ordering to be deterministic and I have absolutely no idea how such thing is UTF-8 related.
Ideally with test example to show bug and that it was fixed.
And moreover this pull request does not fix issue #18 which say that "Email::MIME does not correctly escape the double quote." I do not see any implementation of escaping in this pull request. So probably it refer to another problem...
Hi @pali ... have updated description above with test example :)
... in the next run (this one is IMO correct/expected):
Content-Type: application/pdf; name="=?UTF-8?Q?k=C5=AF=C5=88=2Epdf?=" Content-Disposition: attachment; filename="=?UTF-8?Q?k=C5=AF=C5=88=2Epdf?="
As written in #18 this is incorrect too.
Correct fix for #18 is in pull request #51
@pali did you try the script? It gives alternating results as stated due to Perl hash order.
I know that it gives different result based on perl hash order. But it does not fix #18 nor any other UTF-8 related issue.
I agree that such change to make behavior predicable is OK, but current description is irrelevant to this change.
Sorry if there is confusion @pali. But, it does fix a UTF8 issue first mentioned in #18 as far as I know.
If I run this script repeatedly, I see the follow UTF8 errors in stderr
which only happen sporadically. These are resolved after merging in this patch. I agree that the fix just takes advantage of the sort order of the keys, and there is probably a better/more permanent fix, but this does seem to work.
Unquoted '"' not allowed at test.pl line 5.
Missing semicolon before parameter '"kůň.pdf"' at test.pl line 5.
Wide character in print at test.pl line 5.
Sorry if there is confusion @pali. But, it does fix a UTF8 issue first mentioned in #18 as far as I know.
Nope, it does not fix UTF-8 issue with Content-Type and Content-Disposition headers. As @rjbs (in https://github.com/rjbs/Email-MIME/issues/18#issuecomment-234929880) and also I (in https://github.com/rjbs/Email-MIME/pull/55#issuecomment-329805444) wrote, your following output is incorrect and wrong too.
Content-Type: application/pdf; name="=?UTF-8?Q?k=C5=AF=C5=88=2Epdf?="
Content-Disposition: attachment; filename="=?UTF-8?Q?k=C5=AF=C5=88=2Epdf?="
Correct output for filename kůň.pdf
is:
Content-Type: application/pdf; name*=UTF-8''k%C5%AF%C5%88.pdf
Content-Disposition: attachment; filename*=UTF-8''k%C5%AF%C5%88.pdf
Now I added above kůň.pdf
test to PR #51 to explicitly verify that UTF-8 filename issue is fixed correctly. Hopefully other people would see correct output and would not try proposing something wrong again.
@pali thank you for the clarification and adding the tests! I didn't understand before that the MIME-Q
encoding wasn't correct for filename spec.
So, investigating this then, I still have an issue and suggestion.
Suggestion: Reverse the order of filename and filename, so that filename comes after filename. This extends compatibility for clients that do not support filename*. See RFC6266.
If I run the script below, the filename is not encoded into UTF8, as it does nicely in your tests. This is why the encode
functions were used. Am I implementing this incorrectly?
print Email::MIME->create(
attributes => {
disposition => 'attachment',
encoding => 'base64',
filename => "kůň.pdf",
content_type => 'application/pdf',
name => "kůň.pdf",
},
body => 'XXXX',
)->as_string;
Output:
MIME-Version: 1.0
Content-Type: application/pdf; name="kůň.pdf"
Content-Disposition: attachment; filename="kůň.pdf"
Content-Transfer-Encoding: base64
hm... I have not find requirement of suggestion that filename* should be after filename. I think order is not important. Or is there any explicit statement that not? Also RFC6266 is for HTTP, not for emails.
have you applied both patches? (one for Email-MIME and one for Email-MIME-ContentType)
Many user agent implementations predating this specification do not understand the "filename" parameter. Therefore, when both "filename" and "filename" are present in a single header field value, recipients SHOULD pick "filename" and ignore "filename". This way, senders can avoid special-casing specific user agents by sending both the more expressive "filename" parameter, and the "filename" parameter as fallback for legacy recipients (see Section 5 for an example).
...
Direct the UA to show "save as" dialog, with a filename containing the Unicode character U+20AC (EURO SIGN):
Content-Disposition: attachment; filename*= UTF-8''%e2%82%ac%20rates
Here, the encoding defined in [RFC5987] is also used to encode the non-ISO-8859-1 character.
This example is the same as the one above, but adding the "filename" parameter for compatibility with user agents not implementing RFC 5987:
Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
Note: Those user agents that do not support the RFC 5987 encoding ignore "filename*" when it occurs after "filename".
I agree that most modern clients won't have an issue.
Email::MIME is up to date (1.946).
Email::MIME::ContentType is up to date (1.022).
In first cite part there is no information about order. In second cite part is example and I understood it as example in which both filename* and filename was present without any importance that it would not work when order is reversed... But in RFC2045 which is relevant for emails (RFC6266 is for http), there is a note in section https://tools.ietf.org/html/rfc2045#section-5
After the media type and subtype names, the remainder of the header field is simply a set of parameters, specified in an attribute=value notation. The ordering of parameters is not significant.
Have you applied both patches from both pull requests?
use utf8;
use Email::MIME;
print Email::MIME->create(
attributes => {
disposition => 'attachment',
encoding => 'base64',
filename => "kůň.pdf",
content_type => 'application/pdf',
name => "kůň.pdf",
},
body => 'XXXX',
)->as_string;
Date: Sun, 17 Sep 2017 08:35:42 +0200
MIME-Version: 1.0
Content-Type: application/pdf; name*=UTF-8''k%C5%AF%C5%88.pdf; name=kun.pdf
Content-Disposition: attachment; filename*=UTF-8''k%C5%AF%C5%88.pdf;
filename=kun.pdf
Content-Transfer-Encoding: base64
WFhYWA==
Thanks @pali working as expected now. Appreciate your explanation, will close this PR.
This fixes the issue pointed out by @karel-m and @FGasper, which can be observed by running this code block multiple times:
As described in https://github.com/rjbs/Email-MIME/issues/18, it gives different results on subsequent runs, i.e.
This patch fixes it. It is caused by the fact that Perl hash keys are not ordered.