Encoding messages introduces LF characters.

paho-outreach commented 6 years ago

In #1113, QuotedPrintable.encode was changed from

::Mail::Utilities.to_crlf([::Mail::Utilities.to_lf(str)].pack("M"))

to

[str].pack("M")

The problem I'm running into is that pack introduces LF characters, so encoding a message > 998 characters creates something that is not valid per rfc2822, even if you only had CRLF line-endings before.

I see that #1157 prevents attachments from being forced into quoted-printable, and attachment corruption was the motivation for this change in #1113. Would you be open to restoring the old behavior, or does more need to be done in attachment land first?

Fjan commented 6 years ago

I just ran into another side effect of #1113: The LF line endings are now encoded as =0D in quoted printable messages and I'm seeing a lot of additional empty lines in plain text email messages. I think what happens is that email clients will add a new line for each encoded \r (perhaps it only does this if there is a mix of both \n and \r\n line endings?).

Browsers submit text area fields with \r\n endings in Rails and Ruby uses \n endings by default so there's a going to be a lot of effort to ensure we don't get any mixed line endings in there, I would be in favour of having a way to get the old behaviour back.

jeremy commented 6 years ago

The problem I'm running into is that pack introduces LF characters, so encoding a message > 998 characters creates something that is not valid per rfc2822, even if you only had CRLF line-endings before.

@paho-outreach Could you show an example message or code snippet that demonstrates this?

The LF line endings are now encoded as =0D in quoted printable messages and I'm seeing a lot of additional empty lines in plain text email messages.

@Fjan Could you provide an example?

I've added a pull request which restores \n normalization at #1210. Please give it a shot as well.

vihai commented 6 years ago

This is what is affecting me, it seems to be related to this issue:

> puts ::Mail.new(body: "€\n1234567890123456789012345678901234567890123456789012345678901234567890123\n", charset: 'UTF-8').encoded
Date: Tue, 06 Mar 2018 00:10:47 +0100
Message-ID: <5a9dce77d36b8_17a2ad9b5be30d4315fa@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC
1234567890123456789012345678901234567890123456789012345678901234567890123=

> puts ::Mail.new(::Mail.new(body: "€\n123456789012345678901234567890123456789012345678901234567890123456789012\n", charset: 'UTF-8').encoded).encoded
Date: Tue, 06 Mar 2018 00:12:15 +0100
Message-ID: <5a9dcecfb601c_17a2ad9b5be30d432019@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC=0D
123456789012345678901234567890123456789012345678901234567890123456789012=0D=

> puts ::Mail.new(::Mail.new(::Mail.new(body: "€\n123456789012345678901234567890123456789012345678901234567890123456789012\n", charset: 'UTF-8').encoded).encoded)
Date: Tue, 06 Mar 2018 00:12:37 +0100
Message-ID: <5a9dcee594ee_17a2ad9b5be30d4321a0@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC=0D
123456789012345678901234567890123456789012345678901234567890123456789012=0D=
=0D

After parsing and encoding this specific message two times it ends up with two \r\r that the browser inteprets as two lines.

Also DKIM and S/MIME signatures get invalidated as the message is effectively changed.

Fjan commented 6 years ago

Hi Jeremy, This is an excerpt of a plain text email generated by mail 2.6.6:

Details of the new appointment:
When            : Fri 9/3/2018 15:00 to 16:00
Status          : Approved (unlimited credit) 
ID              : 37211227

This is an excerpt of an email generated by mail 2.7.0:

When            : Fri 9/3/2018 16:00 to 17:00 =0D
Status          : Approved (unlimited credit)  =0D
ID              : 37211228=0D

We can remove the extra \r from the source code, however, some of the fields are provided by the user and browsers submit text area fields with \r\n line endings so it would require filtering all text area fields in every rails app. Note that it is not a problem in HTML messages as those are not white space sensitive. It looks like your fix should do the job, thnx.

paho-outreach commented 6 years ago

Here's a simple example:

require 'mail'

mail = Mail.new do
  # need > 998 characters to trigger line-wrapping
  body 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
end

p mail.encoded

Outputs:

"Date: Tue, 06 Mar 2018 08:09:03 -0800\r\nMessage-ID: <5a9ebd1feab47_135e2b181ec490d025463@worktop.mail>\r\nMime-Version: 1.0\r\nContent-Type: text/plain;\r\n charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\n"

Pre 2.7.0, the line-wrapping characters would be =\r\n.

uhrohraggy commented 6 years ago

Hi, I had opened this within Rails at https://github.com/rails/rails/issues/32836 but was helped to realize this was a separate gem. I used the default mail version 2.7 that comes with rails new, but am still finding this is occurring after upgrading to 2.7.1.rc1. The sample forkable app is at https://github.com/uhrohraggy/email-bug - that app still needs either action_mailer config or a SendGrid account, but in case it's helpful.

uhrohraggy commented 6 years ago

FYI I opened bug in Ruby https://bugs.ruby-lang.org/issues/14741 - correct me if I'm wrong, but [].pack('M') should return CRLF based on the spec, not just LF

nicosticht commented 5 years ago

Updated by shyouhei (Shyouhei Urabe) 9 months ago

See also issue #14352. We have decided to reject this change. The behaviour is now documented.

To those who don't read Japanese let me summarize: Ruby's pack behabiour was from Perl's and Perl also don't emit CRLF. This is supposedly because in UNIX variants, emails are written in LF only and, to convert them to CRLF is done in sendmail(1). We don't want to break things here.

So what's the best solution for mail gem?

Fjan commented 5 years ago

@nicosticht I think we have to wait for https://github.com/mikel/mail/pull/1210 to get merged, but until then downgrading the mail gem to version 2.6.6 works for me

Fjan commented 5 years ago

As far as I can tell the encoded LF produced by the Quoted Printable output of [str].pack("M") does not comply with RFC2045 which defines quoted printable attachments:

A line break in a text body, represented as a CRLF sequence in the text canonical form, must be represented by a (RFC 822) line break, which is also a CRLF sequence, in the Quoted-Printable encoding.

and

Any octet, except a CR or LF that is part of a CRLF line break of the canonical (standard) form of the data being encoded, may be represented by an "=" followed by a two digit hexadecimal representation

The original implementation did this correctly, so I would suggest either reverting or doing [str].pack("M").gsub('=0D',"\r") to be compliant with the standard.

dlouzan commented 4 years ago

@jeremy We have been confronted with exactly this issue when implementing S/MIME signatures for Gitlab. When Unicode characters and quoted-printable is involved, there's a mismatch between the encoded values used for generating the PKCS signature and what is actually generated in the email, that makes the signatures invalid. All of this works when using just ASCII and 7bit.

Our basic approach is an interceptor that does something like below (the incoming message is multipart with text and html parts, file samples available in a link at the end):

def delivering_email(message)
  signed_message = Gitlab::Email::Smime::Signer.sign(
    cert: certificate.cert,
    key: certificate.key,
    data: message.encoded)

  signed_email = Mail.new(signed_message)

  overwrite_body(message, signed_email)
  overwrite_headers(message, signed_email)
end

def overwrite_body(message, signed_email)
  # since this is a multipart email, assignment to nil is important,
  # otherwise Message#body will add a new mail part
  message.body = nil
  message.body = signed_email.body.encoded
end

def overwrite_headers(message, signed_email)
  message.content_disposition = signed_email.content_disposition if signed_email.content_disposition
  message.content_transfer_encoding = signed_email.content_transfer_encoding
  message.content_type = signed_email.content_type
end

I've traced down this to exactly the extra =0D entries incorrectly converted in the decode/encode steps. I was actually able to edit some of our emails and remove those values, and the signatures become valid. But the only solution that is working consistently is downgrading to 2.6.6, which we'd like to avoid.

I also tried some of the patches listed in this ticket, https://github.com/mikel/mail/issues/1325, and your PR https://github.com/mikel/mail/pull/1210. It improves things but does not solve all cases; some of those patches solve the text/html part but I still have issues in the text/plain part. What looks like to be the main problem is that the charset included in the Content-Type seems to be ignored in the crlf conversions by Mail::Encodings::QuotedPrintable class.

For the record, I've checked older email notifications from gitlab.com, and this issue with =0D extra characters has been around for a long time, I think nobody noticed until now because no signatures where involved. It might also be my misunderstanding of the RFC ofc.

In the corresponding Gitlab issue there's also sample message files pre- and post 2.7: https://gitlab.com/gitlab-org/gitlab/issues/197386#note_273117603

Would be really grateful if you could give us your assessment.

Thanks!

dlouzan commented 4 years ago

A sample of the failing behaviour:

require 'mail'

email = Mail.new do
  to      'nicolas@test.lindsaar.net.au'
  from    'Mikel Lindsaar <mikel@test.lindsaar.net.au>'
  subject 'This will break encoded'
  content_type 'text/plain; charset=UTF-8'
  content_transfer_encoding 'quoted-printable'
  body "0123456789012345678901234567890123456789012345678901234567890123456789\n"
end

email.body.raw_source
=> "0123456789012345678901234567890123456789012345678901234567890123456789\r\n"

email.body.encoded
=> "0123456789012345678901234567890123456789012345678901234567890123456789=0D=\n\n"

Mail.new(email).body.encoded
=> "0123456789012345678901234567890123456789012345678901234567890123456789=0D=\n=0D\n"

This would be encoded to 71 chars, though with quoted-printable that becomes 70 + 5:

70 chars for the number chars
the \n at the end is translated to \r\n in raw form, then to =0D\n= because of the conversion of \r to encoded =0D, plus the = for the soft wrap
then the extra \n at the newline soft-wrapped

The endlines for each that should be \r\n are all over broken after that. When afterwards creating a new Mail object with the input of the original object, the same broken conversion is applied a second time. This is what breaks our signatures.

mbedarff commented 4 years ago

I ran into the same issue sending mails with rails and the mail gem in version 2.7.1. Mails containing special chars are sent as quoted printable and all \n new lines are outputted as: ...=0D\r\n

This additional =0D leeds to ugly extra new line in Outlook.

By debugging I found out that the raw souce of the body changed between versions 2.7.0 and 2.7.1: In version 2.7.0 the new lines stay \n, in version 2.7.1 the new lines are converted to \r\n before.

This happens because of calling ::Mail::Utilities.to_crlf on the input string in the initialize function of the Mail::Body class. This was changed with this commit: Restore LF line ending parsing

May be this conversion needs to be undone in the Mail::Encondings::QuotedPrintable.encode function by calling ::Mail::Utilities.to_crlf on str before using pack("M") to encode the raw source.

Fjan commented 4 years ago

This issue is fairly simple: this gem switched to using pack("M") for encoding, but unfortunately there is a bug in Ruby in that means it's not compliant with the RFC with regard to encoding new lines. The bug is marked "won't fix" in ruby because it would be too disruptive to change it after all these years, which I can understand. The only way forward is to revert the change in the gem to work around the new line encoding so it's compliant with the RFC again. This monkey patch does that, you can simply drop it in your initialisers directory to get the old behaviour back.

module Mail
  module Encodings
    class QuotedPrintable < SevenBit
      def self.encode(str)
        ::Mail::Utilities.to_crlf([::Mail::Utilities.to_lf(str)].pack("M"))
      end
    end
  end
end

mbedarff commented 4 years ago

@Fjan Thank you for sharing your monkey patch and explanations.

I converted it to an module that can be prepended to QuotedPrintable and reuses the super method of the original encode method:

module Mail
  module Encodings
    module EncodeQuotedPrintableWithoutCR
      def encode(str)
        ::Mail::Utilities.to_crlf(super(::Mail::Utilities.to_lf(str)))
      end
    end
  end
end

Mail::Encodings::QuotedPrintable.singleton_class.prepend(Mail::Encodings::EncodeQuotedPrintableWithoutCR)

This module is applied as an core extension during initialization of my app.

jeremy commented 3 years ago

Fixed by #1210.

I also tried some of the patches listed in this ticket, #1325, and your PR #1210. It improves things but does not solve all cases; some of those patches solve the text/html part but I still have issues in the text/plain part. What looks like to be the main problem is that the charset included in the Content-Type seems to be ignored in the crlf conversions by Mail::Encodings::QuotedPrintable class.

@dlouzan Could you share the cases that #1210 does not solve for you? Perhaps as separate new issues.

Nitrodist commented 3 years ago

@jeremy can we see a release of 2.7.2, please? We are looking at using the monkey patch above in the mean time.

elalemanyo commented 2 years ago

Hi, I am using the monkey patch and ugly extra new line in Outlook are gone 🙏. But I have noticed some encoding issues, for example ü becomes =C3=BC. Is this normal? Any way to fix this? Thanks

Fjan commented 2 years ago

@elalemanyo That's normal for 7bit encoding, it can't be changed without breaking decoding at the recipient side

elalemanyo commented 2 years ago

@elalemanyo That's normal for 7bit encoding, it can't be changed without breaking decoding at the recipient side

@Fjan ok, but was not like this without the monkey patch, and now I have some failing tests 🤔

mikel / mail

Encoding messages introduces LF characters. #1190