joeyates / imap-backup

Backup and Migrate IMAP Email Accounts
MIT License
1.37k stars 75 forks source link

Encoding error when trying to backup #119

Closed alexg-k closed 2 years ago

alexg-k commented 2 years ago

Hello everyone,

I get the following error when trying to run imap-backup. Is this a bug?

I am running a docker container with: ruby:slim-bullseye as the image. Installation and setup worked as expected.

root@laptop:/# imap-backup
D, [2022-05-25T13:02:35.389582 #80] DEBUG -- : Running backup of account: user@xyz.de
D, [2022-05-25T13:02:35.389662 #80] DEBUG -- : Creating IMAP instance: imap.xyz.de, options: {:port=>993, :ssl=>{:ssl_version=>:TLSv1_2}}
D, [2022-05-25T13:02:35.389692 #80] DEBUG -- : Logging in: user@xyz.de/xxxxxxxxxxxxxx
S: * OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE AUTH=PLAIN STARTTLS] Courier-IMAP ready. Copyright 1998-2017 Double Precision, Inc.  See COPYING for distribution information.
C: RUBY0001 LOGIN user@xyz.de [PASSWORD REDACTED]
S: RUBY0001 OK LOGIN Ok.
D, [2022-05-25T13:02:35.762351 #80] DEBUG -- : Login complete
C: RUBY0002 LIST "" ""
S: * LIST (\Noselect) "." ""
S: RUBY0002 OK LIST completed
C: RUBY0003 LIST "" "*"
S: * LIST (\HasNoChildren) "." "INBOX.Entwürfe"
S: * LIST (\HasNoChildren) "." "INBOX.Deleted Messages"
S: * LIST (\HasNoChildren) "." "INBOX.Entwuerfe"
S: * LIST (\HasNoChildren) "." "INBOX.Junk"
S: * LIST (\HasNoChildren) "." "INBOX.Drafts"
S: * LIST (\HasNoChildren) "." "INBOX.Sent"
S: * LIST (\HasNoChildren) "." "INBOX.Trash"
S: * LIST (\HasNoChildren) "." "INBOX.Spam"
S: * LIST (\HasNoChildren) "." "INBOX.Papierkorb"
S: * LIST (\HasNoChildren) "." "INBOX.Gesendet"
S: * LIST (\HasNoChildren) "." "INBOX.SPAMFOLDER"
S: * LIST (\Unmarked \HasChildren) "." "INBOX"
S: RUBY0003 OK LIST completed
/usr/local/lib/ruby/gems/3.1.0/gems/net-imap-0.2.3/lib/net/imap/data_encoding.rb:32:in `encode': "\\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to UTF-16BE (Encoding::UndefinedConversionError)
    from /usr/local/lib/ruby/gems/3.1.0/gems/net-imap-0.2.3/lib/net/imap/data_encoding.rb:32:in `block in encode_utf7'
    from /usr/local/lib/ruby/gems/3.1.0/gems/net-imap-0.2.3/lib/net/imap/data_encoding.rb:28:in `gsub'
    from /usr/local/lib/ruby/gems/3.1.0/gems/net-imap-0.2.3/lib/net/imap/data_encoding.rb:28:in `encode_utf7'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/folder.rb:122:in `utf7_encoded_name'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/folder.rb:109:in `examine'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/folder.rb:34:in `exist?'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/connection.rb:66:in `block in run_backup'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/connection.rb:148:in `block in each_folder'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/connection.rb:146:in `each'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/connection.rb:146:in `each_folder'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/account/connection.rb:65:in `run_backup'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/backup.rb:16:in `block in run'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/helpers.rb:30:in `block in each_connection'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/accounts.rb:16:in `each'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/accounts.rb:16:in `each'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/helpers.rb:29:in `each_connection'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli/backup.rb:15:in `run'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/cli.rb:46:in `backup'
    from /usr/local/bundle/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
    from /usr/local/bundle/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
    from /usr/local/bundle/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
    from /usr/local/bundle/gems/thor-1.2.1/lib/thor/base.rb:485:in `start'
    from /usr/local/bundle/gems/imap-backup-5.2.0/bin/imap-backup:14:in `block in <top (required)>'
    from /usr/local/bundle/gems/imap-backup-5.2.0/lib/imap/backup/logger.rb:29:in `sanitize_stderr'
    from /usr/local/bundle/gems/imap-backup-5.2.0/bin/imap-backup:13:in `<top (required)>'
    from /usr/local/bundle/bin/imap-backup:25:in `load'
    from /usr/local/bundle/bin/imap-backup:25:in `<main>'
joeyates commented 2 years ago

Hi @alexg-k

Thanks for the bug report. I think this error is caused by a non-standard server, rather than by an imap-backup bug, see below...

The backup is failing when trying to convert the mailbox name "Entwürfe" to utf7 encoding.

This mailbox name has just been received from the IMAP server. The IMAP command is LIST "" "*"

The standard indicates that mailbox names should be transmitted "over the wire" using utf7.

If we look at what was received from the server, when listing folders, we have:

S: * LIST (\HasNoChildren) "." "INBOX.Entwürfe"

This seems wrong - that umlaut should have been encoded to utf7. It should have been as follows:

S: * LIST (\HasNoChildren) "." INBOX.Entw&APw-rfe

As the received encoding is not what is expected, Ruby's Net::IMAP library is handing back the string as "INBOX.Entw\xC3\xBCrfe" with ASCII-8BIT encoding.

We can simulate this in the console as follows:

irb(main):001:0> require "net/imap"
=> true
irb(main):002:0> s = "INBOX.Entwürfe"
=> "INBOX.Entwürfe"
irb(main):003:0> s.force_encoding("ASCII-8BIT")
=> "INBOX.Entw\xC3\xBCrfe"
irb(main):004:0> Net::IMAP.encode_utf7(s).force_encoding("ASCII-8BIT")
Traceback (most recent call last):
        4: from ...ruby/2.7.5/lib/ruby/2.7.0/net/imap.rb:1012:in `encode_utf7'
        3: from ...ruby/2.7.5/lib/ruby/2.7.0/net/imap.rb:1012:in `gsub'
        2: from ...ruby/2.7.5/lib/ruby/2.7.0/net/imap.rb:1016:in `block in encode_utf7'
        1: from ...ruby/2.7.5/lib/ruby/2.7.0/net/imap.rb:1016:in `encode'
Encoding::UndefinedConversionError ("\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to UTF-16BE)

So, I would dare to say that this is a server error.

I would like to see if there is a workaround. imap-backup would need to detect the incorrectly encoded names coming in from the server, and behave accordingly.

alexg-k commented 2 years ago

Good catch! One small deviation from your simulated case is that the original error contains two back slashes.. Maybe add a something like a {}.force_encoding("UTF-7) function to catch the exception and make the encoding more robust? Sorry I can not contribute as this is my first contact with ruby.

Let me know if I can provide any additional data.

joeyates commented 2 years ago

Hi,

The bytes received "over the wire" would seem to be the problem, rather than Ruby's attribution of an encoding. As encodings are basically string metadata, applying one or another won't change the bytes in the string, it will just change how the byte sequence is interpreted.

In this case, we are receiving the (decimal) byte sequence [195, 188] (which is UTF-8 for "ü"). But, in the context of an IMAP response, we should be receiving UTF-7, where none of the bytes should have values above 127, so we're in the realms of "undefined behaviour."

The server here is Courier-IMAP and seems from the copyright message to be a version from 2017 or later. There also seems to be some recent modification of Courier-IMAP relating to UTF-8.

As an aside, under the IMAP Support for UTF-8 extension, it is possible to switch the encoding to UTF-8, but that's not the case here as the server didn't respond with "UTF8=ACCEPT" and the client hasn't issued the "ENABLE UTF8=ACCEPT" command.

As a hack, in this specific case, we could decide to handle such sequences as UTF-8, but that is an arbitrary behaviour which isn't guaranteed to work in other situations.

So, in conclusion, I think the first thing is to trap the conversion error and skip the folder.

joeyates commented 2 years ago

Hi @alexg-k

I've released the workaround for this problem in version 6.2.1

Now, folders with encoding errors will be skipped.