zopefoundation / Products.MailHost

zope.sendmail integration for Zope.
Other
2 stars 7 forks source link

quoted-printable soft line breaks not always correctly recognized #35

Closed d-maurer closed 3 years ago

d-maurer commented 3 years ago

BUG/PROBLEM REPORT (OR OTHER COMMON ISSUE)

https://community.plone.org/t/plone-5-2-3-and-email-encoding-problem/13181 reports a problem which is likely a consequence of Python's quoted-printable implementation's use of \n rather than \r\n as line ending which does not conform to "https://tools.ietf.org/html/rfc2045#section-6.7". As a consequence, soft line breaks are not recognized correctly in some (likely Windows) environments.

This is primarily a Python problem -- but Products.MailHost might be able to work around.

1letter commented 3 years ago

This is broken:

from plone import api
from Products.MailHost.MailHost import MailHost
from transaction import commit
from zope.component.hooks import setSite

setSite(app["Plone"])
portal = api.portal.get()
mh = portal["MailHost"]
body = """
Willkommen John Doe, Ihr Benutzerzugang wurde erstellt. Ihr Benutzername lautet: john.doe1@local.test. Bitte aktivieren Sie diesen, indem Sie  zu http://127.0.0.1:8080/Plone/passwordreset/495e356abda246ebe6da184b4f4c311?userid=john.doe1@local.test gehen. Aktivieren Sie Ihren Zugang bitte vor dem 09.12.2020 12:40.
Mit freundlichen Grüßen
--
Plone """
bbody = body.encode("utf-8")
mh.send(
    bbody,
    "...",
    "...",
    subject="Test",
    charset="utf-8",
    immediate=True,
)
commit()

This is ok:

from plone import api
from Products.MailHost.MailHost import MailHost
from transaction import commit
from zope.component.hooks import setSite

setSite(app["Plone"])
portal = api.portal.get()
mh = portal["MailHost"]
body = """
Willkommen John Doe, Ihr Benutzerzugang wurde erstellt.
Ihr Benutzername lautet: john.doe1@local.test.
Bitte aktivieren Sie diesen, indem Sie 
zu http://127.0.0.1:8080/Plone/passwordreset/495e356abda246ebe6da184b4f4c311?userid=john.doe1@local.test gehen. 
Aktivieren Sie Ihren Zugang bitte vor dem 09.12.2020 12:40.
Mit freundlichen Grüßen
--
Plone """
bbody = body.encode("utf-8")
mh.send(
    bbody,
    "...",
    "...",
    subject="Test",
    charset="utf-8",
    immediate=True,
)
commit()

The length of the Line is the Problem.

d-maurer commented 3 years ago

1letter wrote at 2020-12-3 00:55 -0800:

This is broken: ... In email.encoders you find a definition for function _qencode. Change this from

def _qencode(s):
enc = _encodestring(s, quotetabs=True)
# Must encode spaces, which quopri.encodestring() doesn't do
return enc.replace(b' ', b'=20')

to

def _qencode(s):
enc = _encodestring(s, quotetabs=True)
# Must encode spaces, which quopri.encodestring() doesn't do
return enc.replace(b' ', b'=20').replace("\n", "\r\n")

This should make the quoted printable encoding almost standard conform (the line length might still be too high by 1). Check whether this change fixes your problem.

I have reported the Python problem: "https://bugs.python.org/issue42549".

1letter commented 3 years ago

I have patched the Package here with an alternative policy and here

# MailHost.py#L388
from email import policy
mo = message_from_string(_string_transform(messageText, charset), policy=policy.SMTP)
# MailHost.py#L405
mo['Subject'] = str(Header(subject, charset, errors='replace'))

The correct policy is missing. A look in the Python docs clarified the situation. I think we should use email.policy.SMTP and not the Default Policy.

1letter commented 3 years ago

I have test a little bit the mail creation without any patch Products.MailHost. The only one solution that i found is the follwing code snippet:

LONG_MULTI_LINE = """\
From: {}
To: {}
Content-Type: text/plain
Precedence: bulk

Dear Passengers,

Curabitur aliquet quam id dui posuere blandit. Vivamus magna justo, lacinia eget consectetur sed, convallis at tellus. Vivamus magna justo, lacinia eget consectetur sed, convallis at tellus.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. http://127.0.0.1:8080/test/test/portal_setup/blub Curabitur non nulla sit amet nisl tempus convallis quis ac lectus. Mauris blandit aliquet elit, eget tincidunt nibh pulvinar a.

A little Umlaut: "üüß"

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec velit neque, auctor sit amet aliquam vel, ullamcorper sit amet ligula. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec rutrum congue leo eget malesuada.

thanks, your plone
"""

from_address = api.portal.get_registry_record('plone.email_from_address')
from_name = api.portal.get_registry_record('plone.email_from_name')
sender = formataddr((from_name, from_address))
if parseaddr(sender)[1] != from_address:
    sender = from_address

# EmailMessage Object with the SMTP Policy doesn't destroy the Multiline
msg = message_from_string(LONG_MULTI_LINE.format(sender, to), policy=policy.SMTP)
msg['Subject'] = Header(subject, "utf-8").encode()

# Add Attachments from a namedblobfile
for mailattachment in attachments:
    msg.add_attachment(mailattachment.data, maintype='application', subtype='octet-stream', filename=attachment.filename)    

# Send via MailHost
host = api.portal.get_tool('MailHost')
host.send(
    msg,
    "john.doe.recipient@local.test",
    "john.doe.sender@local.test",
    immediate=False,
    charset='utf-8',
    msg_type='text/plain'
)
d-maurer commented 3 years ago

1letter wrote at 2020-12-7 05:29 -0800:

I have test a little bit the mail creation without any patch Products.MailHost. The only one solution that i found is the follwing code snippet:

I have read the email.policy documentation this morning. It tells that the standard policy is not RFC (more precisely MIME) compatible. As Products.MailHost targets mail delivery, the messages it creates should by MIME compatible. Thus, Products.MailHost should change as you have proposed.

Tests are a likely problem. Those tests mockup a later delivery stage and test against the internal messages passed on to that stage. As a consequence, most tests would need adaptations. This applies to the tests of Products.MailHost (easy) but also to tests elsewhere -- e.g. Plone/application tests.

1letter commented 3 years ago

Yes, i agree. But who tell this the community? ;-)

d-maurer commented 3 years ago

@mauritsvanrees This issue demonstrates that Products.MailHost does not produce standard conform messages which can cause problems in some environments (more precisely with SMTP agents which treat non standard messages wrongly). The fix is likely simple but could break email related tests -- in a way similar you have observed in "https://github.com/plone/buildout.coredev/pull/670#issuecomment-673517109". Do you have suggestion how we should handle this issue?

1letter commented 3 years ago

I have made a test via a local postfix mailer - all is fine. My other Tests with the encoding problem going all out via smtp and ms exchange.

mauritsvanrees commented 3 years ago

@d-maurer If you make a PR for Products.MailHost, I can start Plone Jenkins jobs to test it, or run some locally to get a general idea.

Or you may be able to start them yourself:

d-maurer commented 3 years ago

Maurits van Rees wrote at 2020-12-11 02:37 -0800:

@d-maurer If you make a PR for Products.MailHost, I can start Plone Jenkins jobs to test it, or run some locally to get a general idea.

Thank you for your response.

I will let @1letter make the PR (unlike me, he can verify that everything works in a "hostile" environment). I will help as necessary.

1letter commented 3 years ago

@d-maurer Ok, i have patched some lines in Products.MailHost and Products.CMFPlone locally. But some Tests failing. My problem, i don't know enough the internals of headers mailconstruction. for example

# that is in a test of Products.CMFPlone
self.assertEqual(msg['Subject'], '=?utf-8?q?Password_reset_request?=')

with the policy.smtp in _munge_headers

if subject:
    # remove any existing header otherwise we get two
    del mo['Subject']
    # Perhaps we should ignore errors here and pass 8bit strings
    # on encoding errors
    header_subject = Header('None ASCII Char Töst', charset, errors='replace')        
    mo['Subject'] = header_subject.encode()
    import pdb;
    pdb.set_trace()
-> if mto:
(Pdb) pp mo['Subject']
'None ASCII Char Töst'
(Pdb) pp header_subject.encode()
'=?utf-8?q?None_ASCII_Char_T=C3=B6st?='
(Pdb) pp type(mo['Subject'])
<class 'email.headerregistry._UniqueUnstructuredHeader'>
  1. there is a diffrence between the string and the emailmessage subject
  2. in the email.headerregistry it happens something. but i don't know what.
  3. should i refactor the test above to:
self.assertEqual(msg['Subject'], 'None ASCII Char Töst')

it would help refactor the tests, but i don't know is this the right way. In other words: what ist the correct construction of the subject line? Subject: '=?utf-8?q?None_ASCII_Char_T=C3=B6st?=' or Subject: None ASCII Char Töst

have you a hint or suggestion?

d-maurer commented 3 years ago

1letter wrote at 2021-1-10 12:46 -0800:

@d-maurer

... it would help refactor the tests, but i don't know is this the right way. In other words: what ist the correct construction of the subject line?

What you have have changed should only affect line endings -- nothing inside a header itself (unless the header contained line endings or was split due to overlength).

Formerly, line endings could be "\n"; with your change line endings should be "\r\n". Nothing else should change due to your modification.

-- Dieter

d-maurer commented 3 years ago

Dieter Maurer wrote at 2021-1-10 23:25 +0100:

1letter wrote at 2021-1-10 12:46 -0800:

@d-maurer

... it would help refactor the tests, but i don't know is this the right way. In other words: what ist the correct construction of the subject line?

What you have have changed should only affect line endings -- nothing inside a header itself (unless the header contained line endings or was split due to overlength).

Formerly, line endings could be "\n"; with your change line endings should be "\r\n". Nothing else should change due to your modification.

I may understand why more changes: Before your change, MailHost has used a policy derived from Compat32 (this stands for "compatibility with Python 3.2"). From previous discussions I suppose that you now derive from the SMTP policy which is not based on Compat32. The documentation tells that Compat32 exists for compatibility reasons but has known bugs (likely related to standard conformance). Thus, the implicit move from Compat32 may introduce changed behavior.

I have not understood from your previous description what goes wrong with the test. Do you get an exception or a failure? What failure exactly?

Apart from the strange None at the start of the reported header value, the value treatment looks correct.

1letter commented 3 years ago

The "None" is only a word, it can be everything, like "car", "house, "color" - it's dummy text for test. I only changed the call of helper function message_from_string(...., policy=policy.SMTP) in some cases in Products.MailHost and Products.CMFPlone. The effect of this change:

Before change, the subject header contain this: '=?utf-8?q?my subject?=' After change, the subject header contain this:my subject, without the utf-8 stuff

My Question: Which of both is the correct Headerform? If is the second, then i must rewrite the Tests in Products.CMFPlone. There are Test with checks like this self.assertEqual(msg['Subject'], '=?utf-8?q?my subject?='). Do you know what i mean?

d-maurer commented 3 years ago

1letter wrote at 2021-1-10 22:41 -0800:

The "None" is only a word, it can be everything, like "car", "house, "color" - it's dummy text for test. I only changed the call of helper function message_from_string(...., policy=policy.SMTP) in some cases in Products.MailHost and Products.CMFPlone. The effect of this change:

Before change, the subject header contain this: 'my subject' After change, the subject header contain this:my subject, without the utf-8 stuff

My Question: Which of both is the correct Headerform? If is the second, then i must rewrite the Tests in Products.CMFPlone. There are Test with checks like this self.assertEqual(msg['Subject'], 'my subject'). Do you know what i mean?

Both are equivalent. Because my subject is ASCII only, the "utf-8" stuff is not necessary and therefore maybe omitted with the more modern policy.

Header parts must be encoded when they contain non ASCII characters (and some special constructs that might be misinterpreted as an encoding). The encoding must start and end at word boundaries. A header line (which may contain an encoded part) must respect the maximal line length. There is a special treatment for overlong words requiring encoding. Apart from those limitations, the standard gives much freedom regarding encoding decisions. E.g. ASCII words may or may not be encoded (even though not encoding them is more readable and therefore preferable).

I see two options:

  1. stay with the more modern policy and rewrite the failing tests

  2. use a new policy derived from Compat32 with SMTP line endings (then you should get as only difference the standard conform line endings)

1letter commented 3 years ago

I will check this: compat_SMTP = policy.compat32.clone(linesep='\r\n')

1letter commented 3 years ago

With the changes above, the results of sending via localhost and exchange are fine. The Mails aren't corrupt. Another stupid question: How can i run the Tests in Product.MailHost? bin/test -s ProductsMailHost -t AllHeaders doesn't work in my coredev buildout installation. If i use python3 -m unittest tests/testMailHost.py i get an six import error.

jugmac00 commented 3 years ago

I use tox to run tests.

e.g. tox -e py38 -- -t testAllHeaders

Also see https://github.com/jugmac00/til/blob/master/zope/how-to-run-a-single-test.md

d-maurer commented 3 years ago

1letter wrote at 2021-1-11 00:42 -0800:

Another stupid question: How can i run the Tests in Product.MailHost?

Most zopefoundation tests are designed for zope.testrunner (which likely is activated via your bin/test). I assume that the problems you have observed with bin/test comes from a limited eggs definition for the respective part.

1letter commented 3 years ago

Ok, thanks. I forgot to add Products.MailHost to the test-egg section.

pbauer commented 3 years ago

@1letter can you please push your changes to a branch (or branches if you change mored than one paackage). I most likely ran into the same issue and would like to test your fix.

1letter commented 3 years ago

@pbauer I'm still rewriting the tests. Some fails yet. I can push a WIP branch for Products.MAilHost and Products.CMFPlone.

1letter commented 3 years ago

@pbauer i have pushed my branches to the repositories Products.MailHost and Products.CMFPlone. @d-maurer one test show up an error but didn't fail:

 Error in test testExplicit8bitEncoding (Products.MailHost.tests.testMailHost.TestMailHost)
Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/tests/testMailHost.py", line 567, in testExplicit8bitEncoding
    mailhost.send('Date: Sun, 27 Aug 2006 17:00:00 +0200\n\n'
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 214, in send
    msg, mto, mfrom = _mungeHeaders(messageText, mto, mfrom,
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 465, in _mungeHeaders
    return as_bytes(mo), mto, mfrom
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 536, in as_bytes
    return msg.as_bytes()
  File "/usr/lib/python3.8/email/message.py", line 178, in as_bytes
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.8/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib/python3.8/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.8/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.8/email/generator.py", line 432, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/lib/python3.8/email/generator.py", line 249, in _handle_text
    self._write_lines(payload)
  File "/usr/lib/python3.8/email/generator.py", line 158, in _write_lines
    self.write(lines[-1])
  File "/usr/lib/python3.8/email/generator.py", line 406, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

    32/35 (91.4%)

Error in test test_8bit_special (Products.MailHost.tests.testMailHost.TestMailHostQueuing)
Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/tests/testMailHost.py", line 765, in test_8bit_special
    mh.send(u"""\
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 214, in send
    msg, mto, mfrom = _mungeHeaders(messageText, mto, mfrom,
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 465, in _mungeHeaders
    return as_bytes(mo), mto, mfrom
  File "/home/Development/coredev/src/Products.MailHost/src/Products/MailHost/MailHost.py", line 536, in as_bytes
    return msg.as_bytes()
  File "/usr/lib/python3.8/email/message.py", line 178, in as_bytes
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.8/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib/python3.8/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.8/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.8/email/generator.py", line 432, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/lib/python3.8/email/generator.py", line 249, in _handle_text
    self._write_lines(payload)
  File "/usr/lib/python3.8/email/generator.py", line 155, in _write_lines
    self.write(line)
  File "/usr/lib/python3.8/email/generator.py", line 406, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 0: ordinal not in range(128)

    33/35 (94.3%)/home/Development/coredev/eggs/pyScss-1.3.7-py3.8-linux-x86_64.egg/scss/types.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  from collections import Iterable
/home/Development/coredev/eggs/pyScss-1.3.7-py3.8-linux-x86_64.egg/scss/namespace.py:172: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec()
  argspec = inspect.getargspec(function)
/home/Development/coredev/eggs/pyScss-1.3.7-py3.8-linux-x86_64.egg/scss/selector.py:26: FutureWarning: Possible nested set at position 329
  SELECTOR_TOKENIZER = re.compile(r'''

  Ran 35 tests with 0 failures, 2 errors, 2 skipped in 4.692 seconds.
Running Products.GenericSetup.testing.ExportImportZCMLLayer tests:
  Tear down zope.testrunner.layer.UnitTests in 0.000 seconds.
  Set up Testing.ZopeTestCase.layer.ZopeLite in 0.000 seconds.
  Set up Products.GenericSetup.testing.ExportImportZCMLLayer in 0.000 seconds.
  Running:

  Ran 12 tests with 0 failures, 0 errors, 0 skipped in 0.009 seconds.
Tearing down left over layers:
  Tear down Products.GenericSetup.testing.ExportImportZCMLLayer in 0.002 seconds.
  Tear down Testing.ZopeTestCase.layer.ZopeLite in 0.000 seconds.
Total: 47 tests, 0 failures, 2 errors, 2 skipped in 5.051 seconds.

Have you an Idea or can i ignore this?

d-maurer commented 3 years ago

1letter wrote at 2021-1-11 04:07 -0800:

... Ran 12 tests with 0 failures, 0 errors, 0 skipped in 0.009 seconds. Tearing down left over layers: Tear down Products.GenericSetup.testing.ExportImportZCMLLayer in 0.002 seconds. Tear down Testing.ZopeTestCase.layer.ZopeLite in 0.000 seconds. Total: 47 tests, 0 failures, 2 errors, 2 skipped in 5.051 seconds.


Have you an Idea or can i ignore this?

You should definitely not ignore tests with an error.

The errors occur in the as_bytes method for messages which use 8bit encoding. Apparently, your change broke the message handling for this case.

Logically, for messages with 8bit encoding, the message body is a sequence of bytes which does not need to be "transfer encoded" (e.g. via "quoted-printable" or "base64") but can be transfered directly.

Internally, Python's email package represents messages under Python 3 always via str; to represent a byte, a corresponding surrogate unicode character is used. The as_bytes method contains special logic to detect whether the original message was obtained from a bytes sequence or from a str. In the former case, the original message body is restored; otherwise, a charset must be applied. Somehow, your change seems to have disturbed the handling.

Note that the "\r\n" line endings are (likely) important only for the as_bytes method. I have seen that you put your special policy also in the message_from_string function. Many messages presented to a MailHost will not have such line endings. I do not know whether your policy will pose problems for such messages.

1letter commented 3 years ago

@d-maurer Ok now i understand the code snippet at the end of MailHost.py. I refactor my code (delete all changes) and only add the policy to the fixed_policy. All Test green now. I will update my branch. @pbauer the patch of Products.CMFPlone is not needed, i deleted my branch.

Edit: I test the mailing via my Test-Mailservers and all Mails are not corrupt.

1letter commented 3 years ago

@d-maurer, @jugmac00 Thanks for your help!

pbauer commented 3 years ago

@1letter thanks for the pull-request! I deployed it and it solved the issue for me as well (Plone 5.2.3, Python 3.8, Exchange Mailserver).

mauritsvanrees commented 1 year ago

I am seeing this problem in a Plone 5.2.10 site on Python 3.8. Specifically the contact form has the originally reported problem, with text like "y=ur" instead of "your". The mail test form works, sending a password reset request works. The mail server is Office365, which may be part of the problem, as I do not see the problem locally.

Is anyone else seeing this currently?

I made a patch for the client, and that works. So if others are seeing this, I can make a PR. It is Plone specific though. In the contact form code we call:

message = MIMEText(message, 'plain', encoding)

Here message is an encoded string (well, bytes then). In my patch I change this to:

from Products.MailHost.MailHost import message_from_string
message = message_from_string(message.decode(encoding))

Then it works.

@d-maurer I am inclined to conclude that any code in Plone (or elsewhere in Zope projects) that uses email.message_from_string should import the fixed function from Products.MailHost instead. Does that in general sound like a good idea?

The same is probably true for any code that uses MIMEText to create a message, like in the contact form, but that might depend on how this is used.

mauritsvanrees commented 1 year ago

I have opened an issue in Plone to track this: https://github.com/plone/Products.CMFPlone/issues/3754 Advice is welcome.

d-maurer commented 1 year ago

@d-maurer I am inclined to conclude that any code in Plone (or elsewhere in Zope projects) that uses email.message_from_string should import the fixed function from Products.MailHost instead. Does that in general sound like a good idea?

Yes -- up to the time, https://github.com/python/cpython/issues/85479 gets fixed.