Closed mrzool closed 3 years ago
@mrzool Do you have a test account I could use to test the script? If you have something like LastPass you can send the password securely, and delete the mailbox once this PR is closed.
@rgladwell Will comment with some more feedback asap.
Hey @rgladwell, I apologize for the delay, I've been very busy.
As briefly as possible: I don't think this is a problem with the mailserver. I think the error happens locally because of some weird encoding issue related to how Mail.app exports these mbox
files. The issue seems to be caused exclusively by some file names, and not by the content of the files. I've spent quite some time testing and researching but I couldn't figure it out. I've hit a dead end. Here's what I found out.
The files causing the error look fine by listing them with ls
:
$ ls
Drehbücher 2008.mbox Drehbücher 2013.mbox Drehbücher 2018.mbox
Drehbücher 2009.mbox Drehbücher 2014.mbox Drehbücher 2019.mbox
Drehbücher 2010.mbox Drehbücher 2015.mbox Drehbücher 2020.mbox
Drehbücher 2011.mbox Drehbücher 2016.mbox ONLINE - nachträglich.mbox
Drehbücher 2012.mbox Drehbücher 2017.mbox
But if I list the content the same directory with another utility, like tree
, I notice that something is off:
$ tree -L 1
.
├── Drehbu?\210cher\ 2008.mbox
├── Drehbu?\210cher\ 2009.mbox
├── Drehbu?\210cher\ 2010.mbox
├── Drehbu?\210cher\ 2011.mbox
├── Drehbu?\210cher\ 2012.mbox
├── Drehbu?\210cher\ 2013.mbox
├── Drehbu?\210cher\ 2014.mbox
├── Drehbu?\210cher\ 2015.mbox
├── Drehbu?\210cher\ 2016.mbox
├── Drehbu?\210cher\ 2017.mbox
├── Drehbu?\210cher\ 2018.mbox
├── Drehbu?\210cher\ 2019.mbox
├── Drehbu?\210cher\ 2020.mbox
└── ONLINE\ -\ nachtra?\210glich.mbox
14 directories, 0 files
tree
supports non-ASCII characters with no issues. If I create a replica of this directory structure elsewhere, tree
has no problem displaying the file names properly.
$ mkdir Drehbücher\ {2008..2020}.mbox && mkdir ONLINE\ -\ nachträglich.mbox
$ tree -L 1
.
├── Drehbücher\ 2008.mbox
├── Drehbücher\ 2009.mbox
├── Drehbücher\ 2010.mbox
├── Drehbücher\ 2011.mbox
├── Drehbücher\ 2012.mbox
├── Drehbücher\ 2013.mbox
├── Drehbücher\ 2014.mbox
├── Drehbücher\ 2015.mbox
├── Drehbücher\ 2016.mbox
├── Drehbücher\ 2017.mbox
├── Drehbücher\ 2018.mbox
├── Drehbücher\ 2019.mbox
├── Drehbücher\ 2020.mbox
└── ONLINE\ -\ nachträglich.mbox
14 directories, 0 files
git
is also unable to display those umlauts correctly, although it uses a different escape sequence.
$ git init
Initialized empty Git repository in [path]
$ git st
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
"Drehb\303\274cher 2008.mbox/"
"Drehb\303\274cher 2009.mbox/"
"Drehb\303\274cher 2010.mbox/"
"Drehb\303\274cher 2011.mbox/"
"Drehb\303\274cher 2012.mbox/"
"Drehb\303\274cher 2013.mbox/"
"Drehb\303\274cher 2014.mbox/"
"Drehb\303\274cher 2015.mbox/"
"Drehb\303\274cher 2016.mbox/"
"Drehb\303\274cher 2017.mbox/"
"Drehb\303\274cher 2018.mbox/"
"Drehb\303\274cher 2019.mbox/"
"Drehb\303\274cher 2020.mbox/"
"ONLINE - nachtr\303\244glich.mbox/"
nothing added to commit but untracked files present (use "git add" to track)
So, to sum it up, it looks like imap-upload
is chocking on those files because of some dumb encoding/escaping issue with the filenames that I can't quite figure out. Mail.app seems to be the culprit, as those files come straight out of that app.
Do you have any idea for a fix or workaround? I'm out of ideas.
Hey @rgladwell, afraid I need to rectify most of what I've said above.
I just found out that it's perfectly normal for git to display UTF-8 pathnames using octal notation.
I also tested imap-upload
with the directory structure I manually created above (the one that gets correctly displayed by tree
) and it fails in the same way:
imap_upload.py -r . imaps://testing@example.com:password@mail.your-server.de:993
Connecting to mail.your-server.de:993.
An unknown error has occurred [493]: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)
So maybe Mail.app is not the culprit after all, and imap-upload
might be generally unable to handle non-ASCII pathnames?
EDIT: Just tested it with my Gmail account, it fails in the same way after connecting to the server:
$ imap_upload.py --gmail -r . --user=[my_username] --password=[application_specific_password]
Connecting to imap.gmail.com:993.
An unknown error has occurred [493]: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)
Thanks for the info, taking a look now.
Closed by #20
When an
mbox
or directory name in a scanned path contains a non-ASCII character the script fails with the following error:Tested with this folder structure:
Happy to test further and provide more feedback if needed.