Closed filiplikavcan closed 1 month ago
Our primary issue stems from the double encoding of special characters. Initially, we encode them in HTMLEditorField::ValueEntities(), encompassing all content within the WYSIWYG text area. Subsequently, on the client side within TinyMCE_ssmedia
, we process the line passing the filter img[data-shortcode="image"]
, encoding it once more.
Given that this encoding is necessary when a user initially adds an image, we need to address cases where users simply save or reload existing content containing the image short code.
To address this, we should implement a "double_encode" in HTMLEditorField::ValueEntities()
, enabling users to utilise plugins to insert example code or paste code snippets directly into the text.
public function ValueEntities(){
-return htmlentities($this->Value() ?? '', ENT_COMPAT, 'UTF-8', false);
+return htmlentities($this->Value() ?? '', ENT_COMPAT, 'UTF-8', true);
}
Alternatively, developing a custom plugin for example code insertion is an option, though it may be labor-intensive and likely won't garner significant demand.
To rectify situation with special characters in image short code, it's advisable to make modifications to ShortcodeSerialiser.sanitiseShortCodeProperties
and ShortcodeSerialiser.createHTMLSanitiser
(See) by incorporating exceptions for instances where an ampersand is present in the string. Specifically, we should examine whether the combinations amp;
, quot;
, #039;
, lt;
, gt;
follow the &
symbol. If they do, these substrings should be excluded from processing.
That's a great investigation!
It does look like "double encoding" is the correct behaviour in this context. You don't want HTMLEditor to second guess the data it is getting.
Patching ShortcodeSerialiser.sanitiseShortCodeProperties
and ShortcodeSerialiser.createHTMLSanitiser
seems like the appropriate approach here.
I did notice that the Insert Embed modal had the same problem InsertMediaModal had with &
. Also, none of the modals will let me save "
in their field.
All things considered, I would like content authors to be able to put whatever text they want in any context. But if I have to choose, I would rather them being able to correctly write mark up text in the WYSIWYG rather than being able to put &
or "
in form modals.
Module version(s) affected
5.1.23
Description
This commit: https://github.com/silverstripe/silverstripe-framework/commit/037168a4fe9759b7f464ee8403dab612a570bab6 turned off double encoding of html entities in HTMlEditorField value in an attempt to fix an issue with html entities in shortcodes. That introduced a major bug which prevents TinyMCE to show xml/html code examples.
How to reproduce
Use <strong>tag</strong> to make it bold.
<p>Use <strong>tag</strong> to make it bold.</p>
is stored in DB. This is correct value.The reason is that HTMLEditorField doesn't double encode html entities (4th parameter of htmlentities function is set to false) and produces this textarea value:
instead of this:
Acceptance criteria
Related
PR