jstedfast / MimeKit

A .NET MIME creation and parser library with support for S/MIME, PGP, DKIM, TNEF and Unix mbox spools.
http://www.mimekit.net
MIT License
1.79k stars 360 forks source link

Message.Prepare adds '=\r\n' to last line if it doesn't end with a newline #1052

Closed abrguyt closed 1 week ago

abrguyt commented 1 week ago

Using latest version 4.7.0 of MimeKit, and simple text body mail message:

Recently I noticed that when preparing a mail with a MimeMessage with 'message.Prepare(EncodingConstraint.SevenBit)' and the last line is longer than the default max nr of characters per line (78) and the line does not end with a newline (\n\r), MimeKit seems to erroneously add a '=' symbol (which is the Quoted-Printable soft line-break symbol) to that last line in the encoded bytestream - which then shows up in the receiving email client.

Adding a simple newline (\n\n) to that last line solves the problem, but I don't think that's how it supposed to work.

Anyone else experiencing this?

Other than that - Mail/MimeKit is an amazingly powerful and reliable suite; a work of art.

jstedfast commented 1 week ago

The QuotedPrintableEncoder has a comment stating that it is adding that "=" to the end in order to prevent any QP decoder from interpreting the following "\r\n" from being interpreted as an actual newline.

This is because the encoded content MUST end with a newline.

I'm going to check the Quoted-Printable specification to verify that this is correct behavior, but it is definitely intentional in the code right now.

jstedfast commented 1 week ago

Okay, so the specification doesn't say how to deal with content that doesn't end with a newline sequence, so I'm not 100% certain how that should be handled, but it seems to me that in order for a decoder to output the exact same content as was fed to an encoder, this needs to be done the way it is currently being done.

In other words, if we have "This is some content that doesn't end with a newline" and it gets QP encoded for MIME transport, then it should be expected that decoding the output of a QP encoder, should produce the exact same string content.

The only way for that to happen is if the encoder ensures that if the content doesn't end with a newline, that it adds one using "=\r\n".

jstedfast commented 1 week ago

@abrguyt is the receiving client not decoding it correctly?

abrguyt commented 1 week ago

I'm using Desktop MS Outlook 2019 here to test; that shows the extra '=' at the end. I'll test with two other email clients today.

abrguyt commented 1 week ago

Just tried with Gmail and Yandex on web clients, both don't display the extra '=' while Windows' MS Outlook 2019 does seem to do that. The '=\n\r' gets added indeed for that last line - so could this be classified as just an Outlook bug?

The Outlook QP decoder seems to interpret the '=' QP soft line break as just a normal character when there is no data beyond that last \n\r immediately after the '=' for the last line of text.

When the last line of text that exceeds the maximum length is wrapped to the next encoded line with the '=' symbol, the left-over length of that very last wrapped line will be less than the maximum length - so in that case adding that last '=\n\r' is not really necessary anyway? As there is no continuation of text/data the '=\n\r' could be skipped - it also saves three bytes in size.

jstedfast commented 1 week ago

so could this be classified as just an Outlook bug?

I think so, yes.

the left-over length of that very last wrapped line will be less than the maximum length - so in that case adding that last '=\n\r' is not really necessary anyway?

The MIME data needs to be canonicalized to end with a newline sequence. Either we place a newline without the leading '=' or we use '=\r\n'.

If we use '\r\n' without the '=', then decoding the content will not result in an exact match of the original content. You might be thinking "okay, but that doesn't really matter because visually it looks the same to the user" but that's wrong because in PDFs, for example, newline sequences have meaning and will cause the PDF to be corrupt if it's missing or if there is an extra newline.

abrguyt commented 1 week ago

You're right; adding '\n\r' without a preceding '=' might cause the decoder to interpret that as a newline that wasn't originally there. We will just have to live with this MS Outlook bug and manually add the '\n\r' to any long last input lines. Thanks for investigating.