Open Facyla opened 4 months ago
Possibly related to #471
Possibly related to #471
Probably. I also have the same display errors.
I think I see a pattern here!
@robin555 When you go to "Timeline" on your Suite, does it show ALL of the accented characters as their code names, for example: é
, è
, ë
, ê
?
Yes, it does à - â - ä - é - è - ê - ë - î - ï - ô - ö - ù - û - ü - ÿ - ç are displayed like this on "timeline"
But in the list of notes, the display is fine
The same problem applies to Spanish accents.
Á á É é Í í Óó Úú Ññ Üü ¡ ¿ are displayed like this
There must be some UTF8 to HTML entities conversion occurring, probably when storing text content (especially if the DB does not support UTF8 or stores text as ISO-8859-1), but the reverse process must be missing at some places where it is displayed.
A strange thing is that some other accentuated characters are displayed properly, it looks like there has been an upgrade on character encoding and storage, but some parts still use the older way while other support the new (unicode) way.
@Facyla Yes. The database stores the text as UTF-8
. The "Timeline" ("Echéancier" in french) looks like it's applying the the "preformatted" HTML tag <pre>
to display the exact characters of the text as a "code display", instead of letting the browser render the HTML codes as the correct UTF-8
characters, maybe for security to block links and HTML saved in the user-generated text field.
Ok, so a fix could be to replace the < pre > tag by an escaping function that would handle any suspicious characters and injection attempt, like strip_tags combined with utf-cleaning functions that strip unattended/exotic unicode control chars. A good helping lib for that can be https://github.com/neitanod/forceutf8 which is now maintained here: https://github.com/Fylax/forceutf8 ; or, if one want to keep the relative failsafe behaviour of < pre >, apply an html_entity_decode() before wrapping the result string in < pre >.
Issue
Some translated strings appear with the HTML entity (eg. "& # 39 ;", added space so it displays) instead of the wanted character "'"), at least for the single quote character: examples in the Admin area and in the navigation menu
Expected Behavior
The HTML entity should display the relevant character: here a single quote.
Actual Behavior
The displayed text is the actual HTML unicode code: & # 398 ; (added space so it displays) Same behaviour with the html entity code & rsquo ; (added space so it displays)
There might be also a buit-in mechanism issue that tries to concert/escape some strings in a common maner, as changing the source string from & # 39 ; to & rsquo ; in the source language files (here in modules/Administration/language/fr_FR.lang.php ) seems to "convert" it to & # 39 ; in the cache file located in cache/prod/pools/ folder).
Possible Fix
Steps to Reproduce
Context
Navigating through the Admin area after testing upgrades.
Environment