Closed rgladwell closed 2 years ago
@mrzool Can you please verify if this fixes the issue for you.
Hey @rgladwell, sadly it doesn't, though the character reported in the error message is now different.
With a pathname containing an ü
character, on master
I get the following error:
An unknown error has occurred [493]: 'ascii' codec can't decode byte 0xc3 in position 42: ordinal not in range(128)
With this last patch, I get
An unknown error has occurred [494]: 'ascii' codec can't encode character u'\xfc' in position 11: ordinal not in range(128)
EDIT: seems to be a common error.
@mrzool I fixed this issue for unicode MBOX file names, but not for unicode sub-directory names. Please retry the latest commit and let me know how you get on.
I'm still getting that error, but now it's happening later in the process, and once for every email.
With ec2beaf:
Connecting to mail.your-server.de:993.
Found mailbox at archiv/INBOX/Drehbücher/Drehbücher 2010.mbox/mbox...
An unknown error has occurred [494]: 'ascii' codec can't encode character u'\xfc' in position 11: ordinal not in range(128)
With the last patch b2967f5:
Connecting to mail.your-server.de:993.
Found mailbox at archiv/INBOX/Drehbücher/Drehbücher 2010.mbox/mbox...
Uploading to INBOX.Drehbücher.Drehbücher 2010...
Counting the mailbox (it could take a while for the large one).
1/211 2.6 kB Deutsche Drehbücher Bestellu NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 37.9 kB Re: Abonnement DEUTSCHE DREHB NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.1 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.3 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.2 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.2 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.2 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 24: ordinal not in range(128))
1/211 5.2 kB Abonnement DEUTSCHE DREHBÜCHE NG ('ascii' codec can't encode character u'\xfc' in position 25: ordinal not in range(128))
...
@mrzool I suspect it's the unicode characters in the email subject lines this time. Do you have an example MBOX file I can use to test locally?
@rgladwell I'll send a sample mbox your way now. Thanks a lot!
@mrzool Which version of python are you using?
@mrzool Also what testing IMAP server are you using? A hosted one or self-hosted? Software?
@rgladwell python -V
says Python 3.8.5
on my system (macOS Mojave).
The server is our production mailserver on Hetzner on a managed server plan. Not sure about which IMAP implementation are they running exactly, but I mentioned the server capabilities in my previous issue here.
Is this a server issue again?
Not sure: I think the Python IMAP API doesn't handle UTF-8 strings unless the UTF8=ACCEPT
capability is enabled.
However, your IMAP server doesn't appear to support the ENABLE
capability. Which is required to enable the UTF-8 support on both the client and server.
Capabilities it does support are:
('IMAP4', 'IMAP4REV1', 'UIDPLUS', 'CHILDREN', 'NAMESPACE', 'THREAD=ORDEREDSUBJECT', 'THREAD=REFERENCES', 'SORT', 'QUOTA', 'IDLE', 'ACL', 'ACL2=UNION')
I'm not an expert on IMAP, not sure if this is a standard security configuration or something specific to your install/configuration.
This might be a complicated edge case involving this particular IMAP configuration. What's weird about it, though, is that I already uploaded thousands of emails with unicode subject lines and bodies using imap-upload and never had a single issue with it.
Only pathnames containing unicode chars were causing troubles, which I worked around by cleaning them with detox before the upload up until now. The issue with the subject lines only started with either ec2beaf or b2967f5.
Not sure what to make of that? 🤔
It was a long shot to assume the issue was environmental. I suspect these are limitations of the Python imaplib
API. Could be resolved by switching to another Python library, but that would stop this being a stand-alone script.
Sorry, I'm currently busy with other work so haven't had time to take a look at this issue.
If you have the time yourself, I'd be happy to give advice and review code. Otherwise, it maybe a while before I can get back round to this.
This seems a likely candidate as an alternative IMAP library:
Script fails with a codec error for mailbox file names with unicode characters:
This fixes this bug and closes:
https://github.com/rgladwell/imap-upload/issues/19