thunderbird / import-export-tools-ng

Import Export Tools that supports Thunderbird v68-v128
Other
294 stars 31 forks source link

v14.1.1 - Plaintext export does not support unicode characters (symbol, emojis...) #639

Open john-p-knapp opened 1 month ago

john-p-knapp commented 1 month ago

It appears that some characters are getting dropped in the plaintext export. I have specifically noticed –“”’ (U+2013, U+201C, U+201D, U+2019) are dropping in plaintext but are being exported in html and pdf formats.

I've tested with both the 14.1.1 & 14.1.2 beta versions. Happy to gather any additional details that would be helpful.

Thanks for all the work you do!

cleidigh commented 1 month ago

@john-p-knapp Thanks for the props! I did a few tests today and verified your findings.

It's actually a larger issue. The plaintext "converter" api in Thunderbird actually does not handle unicode. Even the smile emoji is not converted. I am actually surprised this has not come up before. There is some technical or perhaps philosophical perspective that "plaintext" == ASCII which would mean not including unicode characters.

Aside from philosophy, I am not sure I can address this with the current apis. I will see if there are any new methods . @cleidigh

jobisoft commented 1 month ago

Hi @cleidigh !

In latest Thunderbird 128 we have added a new API: browser.messengerUtilities.convertToPlainText

That seems to correctly keep these chars:

await browser.messengerUtilities.convertToPlainText("<p>Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)</p><p>Are not dropped in plaintext</p>")

Result:

Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)

Are not dropped in plaintext

The API is using this under the hood, but you should of course use the API directly :-)

We miss you in the developer meetings

cleidigh commented 1 week ago

@john-p-knapp Using the new html converter mentioned by @jobisoft , I have changed plaintext export such that it supports conversion of all unicode characters. You can grab from here to test:

@cleidigh

Dricc123 commented 1 day ago

Is this only for export? because I have the issue with emails I imported. Letters with accents in french don't show properly in some plaintext emails

cleidigh commented 8 hours ago

@Dricc123 Import has a totally different path and does not use a plain text converter. I am not sure about what you are seeing. Do you have a sample email you could send me? test1@kokkini.net

@cleidigh

cleidigh commented 6 hours ago

@Dricc123 Got your email with your explanation. Can you send me an eml with the problem? I want to try Importing myself. @cleidigh

cleidigh commented 2 hours ago

@Dricc123 FYI I did a roundtrip export /import of a message with unicode characters without any problem. I think I need to see one of your problem emls. @cleidigh