Open john-p-knapp opened 1 month ago
@john-p-knapp Thanks for the props! I did a few tests today and verified your findings.
It's actually a larger issue. The plaintext "converter" api in Thunderbird actually does not handle unicode. Even the smile emoji is not converted. I am actually surprised this has not come up before. There is some technical or perhaps philosophical perspective that "plaintext" == ASCII which would mean not including unicode characters.
Aside from philosophy, I am not sure I can address this with the current apis. I will see if there are any new methods . @cleidigh
Hi @cleidigh !
In latest Thunderbird 128 we have added a new API: browser.messengerUtilities.convertToPlainText
That seems to correctly keep these chars:
await browser.messengerUtilities.convertToPlainText("<p>Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)</p><p>Are not dropped in plaintext</p>")
Result:
Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)
Are not dropped in plaintext
The API is using this under the hood, but you should of course use the API directly :-)
We miss you in the developer meetings
@john-p-knapp Using the new html converter mentioned by @jobisoft , I have changed plaintext export such that it supports conversion of all unicode characters. You can grab from here to test:
@cleidigh
Is this only for export? because I have the issue with emails I imported. Letters with accents in french don't show properly in some plaintext emails
@Dricc123 Import has a totally different path and does not use a plain text converter. I am not sure about what you are seeing. Do you have a sample email you could send me? test1@kokkini.net
@cleidigh
@Dricc123 Got your email with your explanation. Can you send me an eml with the problem? I want to try Importing myself. @cleidigh
@Dricc123 FYI I did a roundtrip export /import of a message with unicode characters without any problem. I think I need to see one of your problem emls. @cleidigh
It appears that some characters are getting dropped in the plaintext export. I have specifically noticed –“”’ (U+2013, U+201C, U+201D, U+2019) are dropping in plaintext but are being exported in html and pdf formats.
I've tested with both the 14.1.1 & 14.1.2 beta versions. Happy to gather any additional details that would be helpful.
Thanks for all the work you do!