mikel / mail

A Really Ruby Mail Library
MIT License
3.62k stars 938 forks source link

mail gem breaks attachment file with encoding Quoted-Printable. #1241

Open kirikak2 opened 6 years ago

kirikak2 commented 6 years ago

I found this issue when I encode mail with binary file attachment in Quoted-Printable.

This is sample case.

require 'mail'

# 1. Create Mail with Quoted-Printable
mail = Mail.new
part = Mail::Part.new
bin = File.binread("./some_binary.jpg")
part.body = [bin].pack("M")
part.content_transfer_encoding = "quoted-printable"
part.transport_encoding = Mail::Encodings::QuotedPrintable
mail.parts << part
File.binwrite("./tmp/original.eml", mail.to_s)

# 2. Read mail and write

mail = Mail.read("./tmp/original.eml")
File.binwrite("./tmp/after.eml", mail.to_s)

This is difference between original.eml and after.eml.

diff --git 1/tmp/original.eml 2/tmp/after.eml
index 7d2c038..506176e 100644
--- 1/tmp/original.eml
+++ 2/tmp/after.eml
@@ -27,15 +27,16 @@ Content-Transfer-Encoding: quoted-printable
 =01=01=00=02=03=00=00=00=00=00=01=02=11=03=12!1=04A=05=13Q"a2q=15=81=B1=FF=
 =DA=00=0C=03=01=00=02=11=03=11=00?=00=1C|f=98=08=A2=C6i@=E2=BFW?9=07=B2A=14=
 =80=F7=A2=04=E4=FB=D2  =13=C5=02=07=B7=A5"=93=9A =1C=D2=8E=B4=0D=02=03=9E=94=
-=FBD=CCQ=12=93=CD8=14=08=08H=02=9Bl=8C=1A6=DC=F1L=13=8A=00=1E=DEim=10{
+=FBD=CCQ=12=93=CD8=14=08=08H=02=9Bl=8C=1A6=DC=F1L=13=8A=00=1E=DEim=10{=0D=
+
 ,sM=1C=98=A4=00=80=A5=B6=8B=B6=96=DA=07=E8=06=DAr=98=E9E=00=FBR=DB=88=A0@=
 =B6=E3=8C=D3G?=D6=8B=B78=A4=A0&=80=05=1C=D3m=A2m=A5=B6=80=07=B4SE=16&i=14=
 =FD(=001=DA=91O=D6=8AS=8E=0D4`=E6=90=D2=03=B6:R=83=EDE=8C=D3m=89=14=C0=19=
-Gz=8CGJ)=039=A6=89=C4=D0=82=81=ED=00qM=00=0E1E =C6M4b=90P";
+Gz=8CGJ)=039=A6=89=C4=D0=82=81=ED=00qM=00=0E1E =C6M4b=90P";=0D
 b=9Fl=D1H=3D)=B6=CF\=F5=A0(=14w=A6=DBE"=04=E6=A2S@=03=81=C7=EBM=B4DQ=00=C4=
 R=8A=07@=8AzTv=E4=D1J}=A9=A0=CD=00=08=A4v8=A8=C7<=D1=A2=A2S#=8A=01=A0[y=A8=
 =C1=E2=8D=B7=EBLA=C9=14=82=80=ED D=7F=D8=D4v=FE=94h"=98=8Fj`=90=18=A6#"=8D=
-=1E=D5=10=07JV:=04=13Q=8C=9A)=07=E1M=04=0CM =02=A4=8C=FB=D4
+=1E=D5=10=07JV:=04=13Q=8C=9A)=07=E1M=04=0CM =02=A4=8C=FB=D4=0D
 z=8A6=DE=D4=D1=88=E9@#=AA      =81=EFNR>=B4M=A7=A5 =9CI=AB%=03=DA =E2=96=DFj$=01=
 Kn S=00a<=C6iG1=D6=88=13=8CR=03=BD =06=13=CC=D2=DB=EDD#=9A@c=A5=00=0C=8AQ=

This problem cause from ruby Array#pack("M") behavior. Ruby's Array#pack("M") doesn't support Quoted-Printable with binary, so ruby does not encode "\n".

["\n"].pack("M") # => "\n"

and Mail library converts "\n" to "\r\n" in to_crlf. so break binary file. I found similar issue in #1010

I asked to ruby community. (https://bugs.ruby-lang.org/issues/14352 sorry this discussion in Japanese.) and I got a answer which this behavior should not be change.

so I propose to change implementation in Mail gem's Quoted-Printable encoder as follow.

diff --git i/lib/mail/encodings/quoted_printable.rb w/lib/mail/encodings/quoted_printable.rb
index 193778c..6a04833 100644
--- i/lib/mail/encodings/quoted_printable.rb
+++ w/lib/mail/encodings/quoted_printable.rb
@@ -16,11 +16,18 @@ module Mail
       # Decode the string from Quoted-Printable. Cope with hard line breaks
       # that were incorrectly encoded as hex instead of literal CRLF.
       def self.decode(str)
-        str.gsub(/(?:=0D=0A|=0D|=0A)\r\n/, "\r\n").unpack("M*").first
+        str.unpack("M").first
       end

       def self.encode(str)
-        [str].pack("M")
+        [str].pack("M").gsub(/(=0D\n|(?<!=)\n)/) do |match|
+          case match
+          when "=0D\n"
+            "=0D=0A=\r\n"
+          when "\n"
+            "=0A=\r\n"
+          end
+        end
       end

       def self.cost(str)

This change has breaking change, breaks test case. For example.

Failures:

  1) quoted-printable should encode quoted printable from text
     Failure/Error: expect(Mail::Encodings::QuotedPrintable.encode("This is\na test")).to eq result

       expected: "This is\na test=\n"
            got: "This is=0A=\r\na test=\n"

       (compared using ==)

       Diff:
       @@ -1,3 +1,3 @@
       -This is
       +This is=0A=
        a test=

This change is compatibility will be lost, but more safety than before. any idea?

Thanks.

ahorek commented 6 years ago

Hi, I can confirm the bug and the proposed solution works for me.

https://www.ietf.org/rfc/rfc2821.txt

SMTP commands and, unless altered by a service extension, message
data, are transmitted in "lines".  Lines consist of zero or more data
characters terminated by the sequence ASCII character "CR" (hex value
0D) followed immediately by ASCII character "LF" (hex value 0A).
This termination sequence is denoted as <CRLF> in this document.
Conforming implementations MUST NOT recognize or generate any other
character or character sequence as a line terminator.  Limits MAY be
imposed on line lengths by servers (see section 4.5.3).

In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool.  SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a <CRLF>
sequence.

@mikel do you have time to review?

ahorek commented 6 years ago

I prepared a pull request and fixed some issues https://github.com/mikel/mail/pull/1246

but 2 issues still remain: 1/ all lines should always contain only \<CRLF> separators 2/ max line length should be always <= 76 or separated by a soft line break "=" (I don't have a real problem reading it, but it violates RFC now)

I tried to fix them, but it breaks binary attachments again. Any idea?

jfelchner commented 6 years ago

I'm also having this issue @mikel