mikel / mail

A Really Ruby Mail Library
MIT License
3.62k stars 936 forks source link

Cannot get filename on attachments (Mail::Message#has_attachments? raises an exception) #104

Closed zettabyte closed 14 years ago

zettabyte commented 14 years ago

Example E-Mail: http://gist.github.com/606626

The above referenced email breaks Mail::Message#has_attachments? (and a few other things). Specifically, the message is a multipart email with two parts, the second of which is an attachment (starts on line 42) who's header(s) is(are) the source of the problem(s). I'm running gem version 2.2.6.1 on ruby 1.8.7. If you save the linked email to a file named test.eml on your system and run the following in irb:

irb(main):001:0> require 'rubygems' ; require 'mail'
=> true
irb(main):002:0> msg = Mail.read("test.eml")
=> #<Mail::Message:702990... (omitted for brevity)
irb(main):003:0> msg.has_attachments?
NoMethodError: undefined method `filename' for #<Mail::UnstructuredField:0x7fdf8ff66960>
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/field.rb:122:in `send'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/field.rb:122:in `method_missing'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/message.rb:1879:in `find_attachment'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/message.rb:1703:in `attachment?'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/attachments_list.rb:11:in `initialize'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/parts_list.rb:11:in `map'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/parts_list.rb:11:in `each'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/parts_list.rb:11:in `map'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/attachments_list.rb:7:in `initialize'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/parts_list.rb:5:in `new'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/parts_list.rb:5:in `attachments'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/message.rb:1508:in `attachments'
        from /usr/lib/ruby/gems/1.8/gems/mail-2.2.6.1/lib/mail/message.rb:1512:in `has_attachments?'
        from (irb):3
        from :0

The root cause or the first problem is the same thing that caused Issue #82 (I guess this patch: http://gist.github.com/472661 never got merged in or was removed?). The Content-Disposition header of the second message part has a value of ATTACHMENT (line 45) in all caps which the parser doesn't like (it expects a lower-case-only value). So, the first problem is really a request to reopen issue #82 as it appears to not have the fix merged into gem version 2.2.6.1.

However, if I manually merge the fix (http://gist.github.com/472661; I'm assuming this works, the code looks right) or just manually edit the email file and change ATTACHMENT to attachment (line 45) in the email and try again, #has_attachments? returns false. This is an improvement, but is also untrue.

Now the problem is that in the Content-Type header the name parameter is (again) in upper-case (line 43) and the filename parameter in the Content-Disposition header is also in upper-case. (In fact, if you look at all the Content-X headers for both parts, all parameter names are in upper-case).

I can manually query for these parameters:

irb(main):007:0> msg.parts.last.header[:content_disposition].parameters["FILENAME"]
=> "attachment.html"
irb(main):008:0> msg.parts.last.header[:content_type].parameters["NAME"]
=> "attachment.html"

But trying accessor methods fails since they rely on a lower-case string as the hash key:

irb(main):009:0> msg.parts.last.header[:content_disposition].filename
=> nil
irb(main):010:0> msg.parts.last.header[:content_type].filename
=> nil

From the little bit of code-reading I've been doing, I'm not sure what the best fix is. I do know that, broken though they may be, apparently several "sources" of email out there do up-case these parameter names (my real-world email was an auto-generated one from Wells Fargo). I'm wondering if, for well-known "parameters" code should be added to normalize/downcase the parameter name as they're added to the @parameters Hash (or ParameterHash). I'm not sure if this would best be in the respective XxField or XxElement class. But, well, there it is.

zettabyte commented 14 years ago

I think from reading the BNF grammar in RFC 2045 that parameter names (called "attributes" in the RFC) are to be matched in a case-insensitive manner.

http://www.faqs.org/rfcs/rfc2045.html (see Appendix A at the bottom, specifically the "attribute" and "parameter" rules)

If this is so, then perhaps all parameter names should be mapped to lower-case values before being stored and used as hash keys. Just a thought.

zettabyte commented 14 years ago

Hmm, while I'm waiting for a reply from the mailing list regarding why running the spec tests on my forked and cloned copy fails, I've been looking at writing a fix to the underlying problem here.

I re-verified that in RFC 2045 and RFC 2183 that attributes (the name-part of a parameter of all content-* headers) are to use case-insensitive matching. So, my first-blush simple solution would be ensure that in the field class for each content-* header the keys in the ParameterHash be stored as lower-case strings.

After glancing at the specs for the ContentTypeField class, however, I see there are lots of comparisons of the parameters hash with a literal hash, some of which include upper-case letters. So, my solution would involve touching a lot of test code.

Another alternative is to modify the ParameterHash class itself, specifically the [] method to perform a case-insensitive search. However, I'm not sure what else this might break as I'm not sure how extensively it's used in contexts that might require case-sensitivity (though the comments in the code for the hash seem to indicate it's used for MIME extension headers only).

Comments?

mikel commented 14 years ago

I applied your patches on this, please re-open if still an issue