mikel / mail

A Really Ruby Mail Library
MIT License
3.6k stars 931 forks source link

Parsing a `Mail::Message` from a string ignores the string's encoding #1585

Open ghost opened 10 months ago

ghost commented 10 months ago

The following snippet attempts to parse a mail message:

require 'mail'
require 'base64'

msgtxt = "RnJvbTogIkxldJJzIEtldG8gQ2Fwc3VsZXMiIDxhQGIuY28+DQpTdWJqZWN0\nOiBtZWgNCg0KQm9keQ0K\n"
msg = Mail.new(Base64.decode64(msgtxt))
msg.errors.each{|err| puts "Error: #{err}" }

The message contains a sequence of bytes in the From: line that if treated as UTF-8 are malformed. (The message is encoded in base64 to preserve this, but I found the original example in a piece of spam email.)

I can retrieve it by encoding it in ASCII-8BIT (e.g. by setting the the input stream's encoding). However when I pass it to Mail.new, it fails because Mail::Message defaults to UTF-8 and parsing will attempt to convert the value, which doesn't work because of the malformed text.

This looks like a bug to me. I would have expected it to use the string's encoding as the default charset.

Adding the following seems to fix this:

diff --git a/lib/mail/message.rb b/lib/mail/message.rb
index 5c7d40ab..1d9a1e2f 100644
--- a/lib/mail/message.rb
+++ b/lib/mail/message.rb
@@ -2117,6 +2117,7 @@ module Mail
     end

     def init_with_string(string)
+      @charset = string.encoding.to_s if @defaulted_charset
       self.raw_source = string
       set_envelope_header
       parse_message

(I also can work around this with Mail::Message.default_charset = 'ASCII-8BIT', but it still seems like something that could be improved.)