jens-maus / yam

:mailbox_with_mail: YAM (short for 'Yet Another Mailer') is a MIME-compliant open-source Internet email client written for Amiga-based computer systems (AmigaOS4, AmigaOS3, MorphOS, AROS). It supports POP3, SMTP, TLSv1/SSLv3 connection security, multiple users, multiple identities, PGPv2/v5 encryption, unlimited hierarchical folders, an ARexx interface, etc...
https://yam.ch
GNU General Public License v2.0
62 stars 18 forks source link

Wrong caracters in text #616

Closed jens-maus closed 8 years ago

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-02-04 08:47:09 +0100


Summary

I get e-mails with only html part in it and when i double click on that part to open it in OWB, accented caracters are missing. I add a screengrab as well as the exported e-mail from YAM.

Steps to reproduce

  1. just open the html part in a browser 2.

    Expected results

    Actual results

    Regression

    Notes

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-02-04 08:48:09 +0100


Attachment added: OWB RMS COMMUNICATIONS.png (94.3 KiB) OWB RMS COMMUNICATIONS.png

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-02-04 08:52:08 +0100


Attachment added: Nouvelle_commande_client__285__-_4_fevrier_2016.eml (9.9 KiB)

tboeckel commented 8 years ago

Originally on 2016-02-11 09:14:19 +0100


The problem with this mail is that it is HTML only and the HTML document itself contains information about its encoding. Both the mail's header and the document state to be UTF8 encoded, which is correct and ok.

YAM internally tries converts all displayable contents to UTF8 and back to the system charset when the contents are eventually to be displayed. This also happens for the HTML contents. These are converted to i.e. ISO8859-1 and then are passed to Odyssey/IBrowse/whateverbrowser. And here the problem arises, because the HTML document has its own beta header and states to be UTF8 encoded, although the true encoding has been changed to ISO8859-1 by YAM. The browser then tries to treat the ISO encoding as UTF8 (because it is told to do so by the document) and hence displays wrong characters.

I am really not sure if it is feasible to parse the HTML document for these meta headers and skip the final conversion in case an embedded encoding information is found and let the browser do the dirty work.

tboeckel commented 8 years ago

Originally on 2016-02-13 21:13:44 +0100


Having thought about this issue for some time I came to the conclusion that it might be a good idea to skip that final conversion for HTML documents. Skipping it results in the supplied example document to be displayed correctly in IBrowse. I think Odyssey will do likewise.

tboeckel commented 8 years ago

Originally on 2016-02-13 21:15:28 +0100


In (c4b904a):

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-03-02 07:49:40 +0100


Sorry but this seems still not to work :-( I still do get e-mails with wrong caracters shown in html. I add such an e-mail if you could have a look and if possible solve the problem, it woudl really be very helpfull!

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-03-02 07:52:10 +0100


Attachment added: FW__installation_Dr_Francis.eml (104.2 KiB)

tboeckel commented 8 years ago

Originally on 2016-03-02 08:56:42 +0100


Ok, I see that the internal conversion to UTF8 is still happening. This is why you end up with wrong characters. I think it is better to treat HTML documents like binary data and keep them in their original state.

tboeckel commented 8 years ago

Originally on 2016-03-02 08:58:22 +0100


In (df78784):

tboeckel commented 8 years ago

Originally on 2016-03-02 08:59:56 +0100


In (23a2b81):

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-03-14 07:10:02 +0100


Hi, it seems that there are still e-mails which do not show correct characters like the one I attach to this ticket. Could you please correct this also? Thanks you very much in advance.

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-03-14 07:14:44 +0100


Attachment added: Analysez_vos_plongees.eml (16.4 KiB)

tboeckel commented 8 years ago

Originally on 2016-03-16 15:09:34 +0100


Sigh. This was supposed to happen more sooner than later. The HTML part of your last example mail must be converted by YAM again to be displayed correctly, while the second example mail must be excluded from conversion, because the HTML part's meta data explicitly state the charset.

tboeckel commented 8 years ago

Originally on 2016-03-16 15:13:10 +0100


In (ecfc551):

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-04-08 07:04:51 +0200


Sorry but I still get wrong caracters now in the Subject line :-( Please see e-mail and screengrab.

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-04-08 07:05:20 +0200


Attachment added: Fax_von_0216196708.eml (77.2 KiB)

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-04-08 07:06:15 +0200


Attachment added: wrong_caracter_subject.png (47.9 KiB) wrong_caracter_subject.png

tboeckel commented 8 years ago

Originally on 2016-04-08 07:32:01 +0200


Replying to rmsyam:

Sorry but I still get wrong caracters now in the Subject line :-(

This is completely unrelated to this issue and is caused by the mail itself instead of an HTML text part.

Look at this From: header line:

From: "Fax reçu par la Fritzbox" <christoph.poelzl@rmsvideo.ch>

This line is broken already, because it contains UTF8 characters, but no hint that it is indeed UTF8 encoded. A valid line would look like this:

From: =?UTF-8?Q?Fax_re=c3=a7u_par_la_Fritzbox?= <christoph.poelzl@rmsvideo.ch>

There is nothing we can do about this. Sorry.

jens-maus commented 8 years ago

Originally by christoph.poelzl@rmsvideo.ch on 2016-04-08 07:37:56 +0200


Oh, what a shame :-( Well if there is nothing what can be done, then I guess I must live with it. Thanks anyway.