rjbs / Email-MIME

perl library for parsing MIME messages
20 stars 30 forks source link

filename() is no longer decoded since 1.949 #76

Closed bokutin closed 3 years ago

bokutin commented 3 years ago

Hello.

% cat test2.pl 
#!/usr/bin/env perl

use Modern::Perl;
use Email::MIME;

my $raw = <<'RAW';
Content-Type: image/jpeg;
 name="=?UTF-8?B?cHVibGljLmpwZw==?="
RAW
my $email = Email::MIME->new($raw);

say $email->content_type;
say $email->filename;
say Dump $email->{ct};

% perl -I Email-MIME-1.946/lib test2.pl 
image/jpeg; name="public.jpg"
public.jpg
--- 
attributes: 
  name: public.jpg
composite: jpeg
discrete: image
subtype: jpeg
type: image

% perl -I Email-MIME-1.949/lib test2.pl
image/jpeg; name="public.jpg"
=?UTF-8?B?cHVibGljLmpwZw==?=
--- 
attributes: 
  name: =?UTF-8?B?cHVibGljLmpwZw==?=
composite: jpeg
discrete: image
subtype: jpeg
type: image

% perl -E 'use Email::MIME; say $Email::Simple::VERSION; say $Email::MIME::ContentType::VERSION'
2.216
1.024

Is this the correct behavior? Has the specification changed?

I think this has an effect. https://github.com/rjbs/Email-MIME/commit/94e3034d8a4eb25605d1c80defa7090831ef750a

Thanks,

bokutin commented 3 years ago

I would be grateful if you could answer. @dlucredativ @rjbs

dlucredativ commented 3 years ago

I did not indend to break someone's use case. However, unless I am missing some RFC, one cannot expect name parameter values to be decoded. From RFC 2047:

An 'encoded-word' MUST NOT appear within a 'quoted-string'.

RFC 2231 addresses this limitation by introducing an extension of the Content-Disposition header. At least Thunderbird assembles both, and when both are present, the Content-Type name parameter is ignored by Email::MIME (as well as Thunderbird), so one should get proper decoding:

root@3091e6e2ed1c:/# cat test.pl 
#!/usr/bin/env perl

use v5.20;
use Email::MIME;

binmode(STDOUT, ":utf8");

say $Email::MIME::VERSION;

my $raw = <<'RAW';
Content-Type: image/jpeg;
 name="=?UTF-8?B?cHVibGljLmpwZw==?="
Content-Disposition: attachment;
 filename*=UTF-8''%70%75%62%6C%69%63%2E%6A%70%67
RAW
my $email = Email::MIME->new($raw);

say $email->content_type;
say $email->filename;
root@3091e6e2ed1c:/# perl test.pl
1.949
image/jpeg; name="public.jpg"
public.jpg
root@3091e6e2ed1c:/# 
pali commented 3 years ago

Yes, Content-Type's name parameter does not contain RFC 2047 encoded strings which means that file name =?UTF-8?B?cHVibGljLmpwZw==?= should be treated as is and not trying to decode it via RFC 2047 which is against standards. So seems that Email::MIME do it correctly here and does not violate standards.

As @dlucredativ pointed UNICODE strings may be in Content-Disposition's filename attribute and encoded according to RFC 2231.

Email::MIME should be enable to correctly use Content-Disposition's filename attribute in $email->filename method and correctly decode it.

If you have special requirements to violate standards, you have to do it in your application code. In my opinion generic library should not do it. Also it does not generate Content-Type's name attribute against standard.

In past I implemented support to correctly generate Content-Type's name and Content-Disposition's filename attributes in Email::MIME, Content-Type's name should be always only subset of 7-bit ASCII and UNICODE strings are automatically "downgraded" to 7-bit ASCII.

bokutin commented 3 years ago

Thanks to both of you for your detailed explanations.

I understand that the behavior has changed, but it has been changed for the better.

I will try to deal with this on the application side.

I would like to continue using Email::MIME and Email::Stuffer.

Thank you very much!