purplesyringa / ZeroMailProxy

SMTP/POP3 protocol for ZeroMail
12 stars 5 forks source link

UTF-8 doesn't work #2

Closed jlxip closed 6 years ago

jlxip commented 6 years ago

Hello, seems to be a bug regarding UTF-8 characters in mail subjects (tested on Thunderbird, GNU/Linux). UTF-8 character not displaying correctly

The body of the mail DOES work with UTF-8 characters, so it's not an alarming bug. The rest of the program works great, congratulations.

jlxip commented 6 years ago

Update: sending a mail with the subject Á, shows this debug output: Encoded subject

Which, in Thunderbird works well:

But not in ZeroMail itself:

jlxip commented 6 years ago

Update: sending a mail with body Ñ, seems to crash the server and shows this traceback:

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/tmp/a/ZeroMailProxy/pop3/server.py", line 41, in run
    session = Session(conn, Mailbox=self.Mailbox)
  File "/tmp/a/ZeroMailProxy/pop3/session.py", line 12, in __init__
    self.init()
  File "/tmp/a/ZeroMailProxy/pop3/session.py", line 47, in init
    self.ok(getattr(self.transaction, name)(*args))
  File "/tmp/a/ZeroMailProxy/pop3/transaction.py", line 23, in commandStat
    return str(self.mailbox.messageCount()) + " " + str(len(self.mailbox))
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 52, in messageCount
    return len(self.load_messages())
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 48, in load_messages
    messages = {date: Message(int(date), data) for date, data in messages.iteritems()}
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 48, in <dictcomp>
    messages = {date: Message(int(date), data) for date, data in messages.iteritems()}
  File "/tmp/a/ZeroMailProxy/message.py", line 16, in __init__
    bts = array.array("B", bts).tostring().decode("utf8")
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 0: invalid continuation byte

Edit: as the server has crashed, closing it (Ctrl+Z, jobs -p, sudo kill -9 [PID]) and starting it again still crashes and turns inaccessible. Removing the contents of the cache directory makes the server runnable again.

Edit 2: after all seems that 0xD1 (Ñ) is not an UTF-8 character at all. The result of chardet.detect('\xd1') shows that it's actually ISO-8859-1. Maybe an implementation to encode it to UTF-8 could be '\xd1'.decode(chardet.detect('\xd1')['encoding']).encode('utf-8').

purplesyringa commented 6 years ago

Message body is automatically UTF8-encoded and then charset=utf-8 is set. Though, that's not done for subjects.

jlxip commented 6 years ago

However, sending a mail with strange characters in the body crashes the server. See my last message.

jlxip commented 6 years ago

Wait, the encoding seems to work properly. The character Ñ appears UTF8-encoded in cache/[KEY]/messages.json. The issue seems to be at the time of decoding.

jlxip commented 6 years ago

Indeed. The body encodes allright:

And it's sent correctly, as can be seen in ZeroMail:

But the program is unable to decode it when refreshing Thunderbird:

purplesyringa commented 6 years ago

There, fixed! Please check if receiving works for you.

jlxip commented 6 years ago

I can't manage to get it working. :thinking: This time the mail isn't sent.

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/tmp/a/ZeroMailProxy/smtp/server.py", line 41, in run
    session = Session(conn, Mailbox=self.Mailbox)
  File "/tmp/a/ZeroMailProxy/smtp/session.py", line 11, in __init__
    self.init()
  File "/tmp/a/ZeroMailProxy/smtp/session.py", line 56, in init
    getattr(handler, handler.raw_handler)(data)
  File "/tmp/a/ZeroMailProxy/smtp/transaction.py", line 63, in handleData
    self.send(self.from_, self.to, self.data)
  File "/tmp/a/ZeroMailProxy/smtp/transaction.py", line 74, in send
    self.mailbox.send(from_, to, data)
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 99, in send
    self.zeromail.send(subject=subject, body=content, to=address, date=timestamp * 1000, sign=sign)
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 187, in send
    secret = self.get_secret(address)
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 148, in get_secret
    secrets_sent = self.load_secrets_sent()
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 134, in load_secrets_sent
    secrets_sent = json.loads(secrets_sent)
  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
jlxip commented 6 years ago

Does it work on your machine?

purplesyringa commented 6 years ago

Yeah, it works on my machine.

purplesyringa commented 6 years ago

Does loading via Mail.ZeroNetwork.bit work? Anything in DevTools? According to your log, your data.json is a bit broken.

jlxip commented 6 years ago

I don't really know why but now everything stopped working. I have created a clean installation of ZeroNet with a brand new zeroid. I have downloaded ZeroMail, created a MailBox, waited for it to sync, and tried again. Same error.

Output with send.py:

Traceback (most recent call last):
  File "send.py", line 38, in <module>
    zeromail.send(subject=subject, body=body, to=to, date=time.time() * 1000)
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 187, in send
    secret = self.get_secret(address)
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 148, in get_secret
    secrets_sent = self.load_secrets_sent()
  File "/tmp/a/ZeroMailProxy/zeromail.py", line 134, in load_secrets_sent
    secrets_sent = json.loads(secrets_sent)
  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
jlxip commented 6 years ago

Wait, I have just sent a mail to myself with ZeroMail and the program seems to be working now. I don't understand what is going on.

jlxip commented 6 years ago

That was with ASCII body. Now, I tried to send á and I still get the same error as before the update:

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/tmp/a/ZeroMailProxy/pop3/server.py", line 41, in run
    session = Session(conn, Mailbox=self.Mailbox)
  File "/tmp/a/ZeroMailProxy/pop3/session.py", line 12, in __init__
    self.init()
  File "/tmp/a/ZeroMailProxy/pop3/session.py", line 47, in init
    self.ok(getattr(self.transaction, name)(*args))
  File "/tmp/a/ZeroMailProxy/pop3/transaction.py", line 23, in commandStat
    return str(self.mailbox.messageCount()) + " " + str(len(self.mailbox))
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 52, in messageCount
    return len(self.load_messages())
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 48, in load_messages
    messages = {date: Message(int(date), data) for date, data in messages.iteritems()}
  File "/tmp/a/ZeroMailProxy/mailbox.py", line 48, in <dictcomp>
    messages = {date: Message(int(date), data) for date, data in messages.iteritems()}
  File "/tmp/a/ZeroMailProxy/message.py", line 10, in __init__
    self.body = self.unicode_encode(raw["body"])
  File "/tmp/a/ZeroMailProxy/message.py", line 20, in unicode_encode
    bts = array.array("B", bts).tostring().decode("utf8")
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: invalid continuation byte
purplesyringa commented 6 years ago

That looks like invalid unicode. Do you send á via mail.zeronetwork.bit or ZeroMailProxy?

jlxip commented 6 years ago

I send á to myself via ZeroMailProxy, using Thunderbird, and it was received OK in mail.zeronetwork.bit, but when I try to update received mails in ZeroMailProxy, that error appears.

purplesyringa commented 6 years ago

Okay, I'll look into this.

purplesyringa commented 6 years ago

In fact that's not UTF-8. "\xe1" is a valid JavaScript string and that's equal to "á". But Python cannot handle such strings, looks like I have to find a way to translate JS strings into unicode.

purplesyringa commented 6 years ago

Wait, it is UTF-8. Somehow Python thinks it is already decoded, so decoding it again doesn't work.

So I now treat UnicodeDecodeError as a sign of string being unicode. So at least the server won't crash now, but some incorrect messages may be sent to ThunderBird (though only if they were already incorrect).

Though sending Ñ still doesn't work for me, but some problems should be fixed.

purplesyringa commented 6 years ago

Should be fixed now!

purplesyringa commented 6 years ago

@jlxip

jlxip commented 6 years ago

Damn, works better than expected. Even in subjects! Seems pretty stable :+1:

Great job!!!