Open mhahnenberg opened 6 years ago
I can make these changes if all of this makes sense, I just wanted to discuss it first and make sure I'm understanding everything correctly.
All is reasonable. I would prefer the soft line breaks option. @b0d0nne11 what do you think?
After some more investigation it appears quopri.encodestring
does the correct thing with line lengths > 76, so that removes the need to handle the 2nd bullet. I assumed that it didn't bother with line length-related things because the documentation for the quopri
module didn't explicitly mention it.
That leaves just the first bullet, which is a much easier change 😄
Okay, so I did even more investigation and I think the original problem I described doesn't exist. I think the actual issue is that when we first scan a MIME string the resulting MimePart(s) return False
for was_changed()
, so when we call message.to_string()
we just return the string as it was scanned in without re-encoding it. This avoids all the logic of detecting which CTE should be used. The following test demonstrates this:
def text_long_line_quoted_printable():
# TEXT_LONG_LINES is a string containing a MIME message with lines
# longer than allowed for quoted-printable.
message = scan(TEXT_LONG_LINES)
# Without the following lines this test will fail.
# message._container._body_changed = True
# assert message.was_changed(ignore_prepends=True)
print(message.to_string())
eq_('quoted-printable', message.content_encoding.value)
eq_('ascii', message.content_type.get_charset())
Is this the desired behavior?
@mhahnenberg the idea is that if we do not need to make changes to the mime the we return it as is. This is intentional. Originally we did encode message in to_string
even if it was not changed, but some customers complained (do not remember why though) and we introduced this is_changed
behavior.
@anton-efimenko may be you remember what why customers complained?
You're going to laugh at me, and I'm really sorry for continually changing what this issue is about, but this has been a very tricky bug for me to figure out 😞 However, I think I might have it now (hopefully).
flanker
hardcodes line endings as CRLF
by default. To encode messages as quoted-printable
flanker
uses quopri
which tries to automatically detect which line ending to use for soft line breaks based on its input (see https://bugs.python.org/issue20121). quopri
must generate soft line breaks for long lines to obey the 76 character limit imposed by the quoted-printable
standard.
By default quopri
assumes LF
unless it runs into a CRLF
first. flanker
encodes each part's body in isolation, so if a part's body contains only LF
line endings (or no line endings at all) quopri
will assume that it should use =\n
for soft line breaks when in reality we always want =\r\n
since CRLF
is hardcoded in flanker
and it is unwise to mix line endings.
I think a quick fix for this would be for flanker to trick quopri by prepending a CRLF to the part's body and then removing it after the encoding has been performed.
RFC1341 states: "The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long."
If we pass a long
7bit
text part with simple'\n'
in it, it will be encoded asquoted-printable
which leads to invalid output. This is due to two things:has_long_lines()
usesstr.splitlines()
, which splits universal newlines (which include'\n'
), thus reducing the perceived length of the lines (even though they're not considered line breaks in the finalquoted-printable
CTE). Instead I believe it should usestr.split('\r\n')
.Even after fixing this, we still pass
quoted-printable
as the preferred encoding inside_choose_text_encoding
even though the line length is too long. I think we should either preferbase64
for text parts with long lines (simpler implementation, butbase64
incurs a higher overhead) or convert the long lines to aquoted-printable
compatible form using "soft" line breaks.Relevant code is here: https://github.com/mailgun/flanker/blob/master/flanker/mime/message/part.py#L657