silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
720 stars 820 forks source link

HTMLEditorField is not able to show html/xml code examples #11207

Closed filiplikavcan closed 1 month ago

filiplikavcan commented 2 months ago

Module version(s) affected

5.1.23

Description

This commit: https://github.com/silverstripe/silverstripe-framework/commit/037168a4fe9759b7f464ee8403dab612a570bab6 turned off double encoding of html entities in HTMlEditorField value in an attempt to fix an issue with html entities in shortcodes. That introduced a major bug which prevents TinyMCE to show xml/html code examples.

How to reproduce

  1. Put this text to any HTML field (not as a code but as a text to TinyMCE): Use <strong>tag</strong> to make it bold.
  2. Save/Publish
  3. <p>Use &lt;strong&gt;tag&lt;/strong&gt; to make it bold.</p> is stored in DB. This is correct value.
  4. However, what you now see in TinyMCE is: Use tag to make it bold.

The reason is that HTMLEditorField doesn't double encode html entities (4th parameter of htmlentities function is set to false) and produces this textarea value:

&lt;p&gt;Use &lt;strong&gt;tag&lt;/strong&gt; to make it bold.&lt;/p&gt;

instead of this:

&lt;p&gt;Use &amp;lt;strong&amp;gt;tag&amp;lt;/strong&amp;gt; to make it bold.&lt;/p&gt;

Acceptance criteria

Related

PR

sabina-talipova commented 2 months ago

Our primary issue stems from the double encoding of special characters. Initially, we encode them in HTMLEditorField::ValueEntities(), encompassing all content within the WYSIWYG text area. Subsequently, on the client side within TinyMCE_ssmedia, we process the line passing the filter img[data-shortcode="image"], encoding it once more.

Given that this encoding is necessary when a user initially adds an image, we need to address cases where users simply save or reload existing content containing the image short code.

To address this, we should implement a "double_encode" in HTMLEditorField::ValueEntities(), enabling users to utilise plugins to insert example code or paste code snippets directly into the text.

public function ValueEntities(){
-return htmlentities($this->Value() ?? '', ENT_COMPAT, 'UTF-8', false);
+return htmlentities($this->Value() ?? '', ENT_COMPAT, 'UTF-8', true);
 }

Alternatively, developing a custom plugin for example code insertion is an option, though it may be labor-intensive and likely won't garner significant demand.

To rectify situation with special characters in image short code, it's advisable to make modifications to ShortcodeSerialiser.sanitiseShortCodeProperties and ShortcodeSerialiser.createHTMLSanitiser (See) by incorporating exceptions for instances where an ampersand is present in the string. Specifically, we should examine whether the combinations amp;, quot;, #039;, lt;, gt; follow the & symbol. If they do, these substrings should be excluded from processing.

maxime-rainville commented 1 month ago

That's a great investigation!

It does look like "double encoding" is the correct behaviour in this context. You don't want HTMLEditor to second guess the data it is getting.

Patching ShortcodeSerialiser.sanitiseShortCodeProperties and ShortcodeSerialiser.createHTMLSanitiser seems like the appropriate approach here.

I did notice that the Insert Embed modal had the same problem InsertMediaModal had with &. Also, none of the modals will let me save " in their field.

All things considered, I would like content authors to be able to put whatever text they want in any context. But if I have to choose, I would rather them being able to correctly write mark up text in the WYSIWYG rather than being able to put & or " in form modals.