salesagility / SuiteCRM-Core

SuiteCRM - Open source CRM for the world
https://www.suitecrm.com
GNU Affero General Public License v3.0
174 stars 120 forks source link

HTML entities displayed in frontend interface #474

Open Facyla opened 4 months ago

Facyla commented 4 months ago

Issue

Some translated strings appear with the HTML entity (eg. "& # 39 ;", added space so it displays) instead of the wanted character "'"), at least for the single quote character: examples in the Admin area and in the navigation menu image image

Expected Behavior

The HTML entity should display the relevant character: here a single quote.

Actual Behavior

The displayed text is the actual HTML unicode code: & # 398 ; (added space so it displays) Same behaviour with the html entity code & rsquo ; (added space so it displays)

There might be also a buit-in mechanism issue that tries to concert/escape some strings in a common maner, as changing the source string from & # 39 ; to & rsquo ; in the source language files (here in modules/Administration/language/fr_FR.lang.php ) seems to "convert" it to & # 39 ; in the cache file located in cache/prod/pools/ folder).

Possible Fix

Steps to Reproduce

  1. Export translations from crowdin
  2. Change some strings to replace unicode html entities by named html entities, and also by utf8 characters
  3. Package the translation module (ie. zip it) and replace the current translation file with that new one
  4. Check how the changed translation render

Context

Navigating through the Admin area after testing upgrades.

Environment

chris001 commented 4 months ago

Possibly related to #471

robin555 commented 4 months ago

Possibly related to #471

Probably. I also have the same display errors.

image

chris001 commented 4 months ago

I think I see a pattern here!

@robin555 When you go to "Timeline" on your Suite, does it show ALL of the accented characters as their code names, for example: é, è, ë, ê ?

robin555 commented 4 months ago

Yes, it does à - â - ä - é - è - ê - ë - î - ï - ô - ö - ù - û - ü - ÿ - ç are displayed like this on "timeline"

crm30

But in the list of notes, the display is fine

crm31

robin555 commented 4 months ago

The same problem applies to Spanish accents.

Á á É é Í í Óó Úú Ññ Üü ¡ ¿ are displayed like this

crm32

Facyla commented 4 months ago

There must be some UTF8 to HTML entities conversion occurring, probably when storing text content (especially if the DB does not support UTF8 or stores text as ISO-8859-1), but the reverse process must be missing at some places where it is displayed.

A strange thing is that some other accentuated characters are displayed properly, it looks like there has been an upgrade on character encoding and storage, but some parts still use the older way while other support the new (unicode) way.

chris001 commented 4 months ago

@Facyla Yes. The database stores the text as UTF-8. The "Timeline" ("Echéancier" in french) looks like it's applying the the "preformatted" HTML tag <pre> to display the exact characters of the text as a "code display", instead of letting the browser render the HTML codes as the correct UTF-8 characters, maybe for security to block links and HTML saved in the user-generated text field.

Facyla commented 4 months ago

Ok, so a fix could be to replace the < pre > tag by an escaping function that would handle any suspicious characters and injection attempt, like strip_tags combined with utf-cleaning functions that strip unattended/exotic unicode control chars. A good helping lib for that can be https://github.com/neitanod/forceutf8 which is now maintained here: https://github.com/Fylax/forceutf8 ; or, if one want to keep the relative failsafe behaviour of < pre >, apply an html_entity_decode() before wrapping the result string in < pre >.