mjl- / mox

modern full-featured open source secure mail server for low-maintenance self-hosted email
https://www.xmox.nl
MIT License
3.38k stars 89 forks source link

"_xF8FF_" support #112

Open jsfan3 opened 6 months ago

jsfan3 commented 6 months ago

Please take a look at how the following two IMAP folders appear in Outlook and mox webmail:

[Inbox/&DiAOMg4pDjU-66xF8FF&DhsOAQ4qDgQ-] [Inbox/&DiAOMg4pDjU-66xF8FF&DhsOAQ4qDgQ-/&DhsOAQ4qDgQ-]

Outlook: outlook

mox: mox webmail

Basically, "xF8FF" is the "/" character separator when used within a folder name.

mjl- commented 6 months ago

Interesting. The string &DiAOMg4pDjU-66_xF8FF_&DhsOAQ4qDgQ- is decomposed into:

  1. &DiAOMg4pDjU- (utf-7)
  2. 66_xF8FF_ (regular text)
  3. &DhsOAQ4qDgQ- (utf-7)

I don't think _xF8FF_ has any special meaning in email protocols. It looks like an attempt at using a unicode character. 0xf8ff is in the "private use" unicode space, see https://en.wikipedia.org/wiki/Private_Use_Areas. If that is indeed intended, I would expect it to be UTF-7-encoded.

I found https://stackoverflow.com/questions/10423116/msexchange-url-encoding, it mentions exchange intends it to be an escape for a slash (no idea why you would want to have/allow a slash in a directory name...). Since I don't think this is standard behaviour, I don't think we should interpret these characters specially. Perhaps either exchange can be changed to juse treat slash as a path separator, but I doubt it. It's probably best to rename the folder manually after syncing.

jsfan3 commented 6 months ago

I am synchronizing a large email archive from Exchange to mox, where the folders to be renamed manually would be in an archive containing hundreds of thousands of folders. An impossible job to do manually. I was not the one who added a slash to the names of some folders.

Also, I use mox to create an archive that is synchronized with Outlook from time to time, so I cannot edit anything manually, otherwise the same problem would occur the next time I synchronize.

I also did a search and did not find any documentation referring to a standard. I think it is a specific and undocumented behavior of Outlook.

Alternatively, it could be a non-standard and undocumented behavior of Davmail, which imapsync connects to in order to synchronize Outlook with mox.

haraldrudell commented 6 months ago

rfc 9051 declares what the server should be doing

https://datatracker.ietf.org/doc/html/rfc9051#name-mailbox-naming

haraldrudell commented 6 months ago

I am digging through those data types — utf7 is deprecated since IMAP4rev2 uses utf-8 quoted strings. rfc 9051 enumerates what’s wrong with utf7 that isn’t a Unicode standard — f8ff indicates the poster is in a utf-16 environment like Java or ECMAScript which is also kind of legacy since the world moved on to utf-8 and utf-32, like Go. utf-16 initially was intended to hold all characters but proved to have an insufficient amount of code points, and utf-8 proved to be more efficient since most characters are single-byte and it’s a superset of US ASCII

I think this problem here comes down to server configuration: — a. the server may be able to disable mailbox hierarchy. It is not a required feature. This would probably fail at imap SELECT
— b. the hierarchy character, typically a single ascii character /, may be made configurable to one or more unicode characters. This would probably fail at imap SELECT — c. do nothing It seems here the root cause is that Microsoft allowed slashes, then had to convert those slashes into something that is not a slash and picked an obscure unicode character. In 2024, that is not net-unicode per rfc 9051 rfc 9051 gets around that by declaring the server can do whatever, as long as the server does the same whatever every time and that whatever works recursively this will work for as long as there is no attempt to map a mox mailbox name back to exchange

The poster should probably process the mailbox name to ensure that: — a. it is valid unicode and valid net-unicode rfc 5198 and — b. get around the hierarchy character

In Go, strings and byte-slices may contain invalid Unicode and the replacement character where bad unicode was

mjl- commented 5 months ago

I can imagine the F8FF is meant as a literal slash to indicate "this/or/that", and not meant as a hierarchy separator (directories).

Perhaps the mailboxes can be renamed both at Exchange and mox to e.g. "this or that", or perhaps use some other character to indicate the "or".

If the slash was meant as separator, perhaps the mailboxes can be changed by first creating a new "this" and then renaming the "this_xF8FF_that" to "this/that"? Also both in Exchange and mox. By doing it on both sides, they both get "fixed" and future synchronization shouldn't cause trouble (though may trigger a full resync of those directories).

In any case, this should probably be done with a little script/program that talks IMAP: list mailbox folders/directories with the xF8FF, and apply the changes.

the-djmaze commented 5 months ago

If you "fix" this, this could still be an issue for other scripts (like imapsync). Especially when the other server disallows RENAME (rfc4314 ACL not kx) https://www.rfc-editor.org/rfc/rfc4314.html#section-4

Say: i run imapsync to synchronize my mail between 2 servers. Mox has this-that, the other has this_xF8FF_that. imapsync will create this_xF8FF_that and put all messages there.

Also / doesn't have to be the separator, it could be anything (mostly a dot .)