processwire / processwire-issues

ProcessWire issue reports.
44 stars 2 forks source link

Swap strip_tags and htmlspecialchars in ProcessPageList so valid titles don't get cut off #1111

Open schwarzdesign opened 4 years ago

schwarzdesign commented 4 years ago

Short description of the issue

Titles that contain the exact string <= are cut off in the page tree. This is caused by strip_tags in the method that renders the page title.

As an example, take the following page title:

Mini-Jobber (gross <= 450 €/month)

Expected behavior

The title should be displayed completely in the page tree:

Mini-Jobber (gross <= 450 €/month)

Actual behavior

The title gets cut off, because strip_tags removes anything after <=:

Mini-Jobber (gross

Screenshots/Links that demonstrate the issue

With a space between the characters it works fine:

with-space

Without the space, the title gets cut off:

without-space

Suggestion for a possible fix

The issue is caused by this line in the source code. The titles get processed by both strip_tags and htmlspecialchars. Simply switching the order of those two functions would solve the problem:

php > $str = 'Mini-Jobber (gross <= 450 €/month)';
php > var_dump(htmlspecialchars(strip_tags($str), ENT_QUOTES, "UTF-8", false));
string(19) "Mini-Jobber (gross "
php > var_dump(strip_tags(htmlspecialchars($str, ENT_QUOTES, "UTF-8", false)));
string(39) "Mini-Jobber (gross &lt;= 450 €/month)"

Another possibility would be to drop strip_tags entirely, from a security standpoint it's not really required if you use htmlspecialchars anyway.

Steps to reproduce the issue

  1. On a standard ProcessWire installation, create a page with the exact characters <= somewhere inside it's title.
  2. The title in the page tree gets cut off at this point.

Setup/Environment

schwarzdesign commented 4 years ago

Small update: As pointed out by RobinS, the issue will only appear if the default HTML Entity Encoder textformatter on the title field is turned off, since it would encode the <= before strip_tags is applied.

ryancramerdesign commented 3 years ago

@schwarzdesign Swapping the order of htmlspecialchars and strip_tags means that strip_tags does nothing at all, so at that stage could just be removed. The strip_tags is not necessary for security, as the htmlspecialchars is all that is necessary. But the strip_tags is there for looks, so that encoded HTML tags don't show in the page list, like if it was added from markdown formatting in the title or some other formatting. If I recall correctly, this was done (a long time ago) to resolve a previous issue report about encoded HTML tags showing in the page list. I'm not sure what's the right way to go here as it's subjective. I could have it allow <= as a specific case?

schwarzdesign commented 3 years ago

@ryancramerdesign I'd argue that strip_tags could be removed completely if it's not required from a security standpoint. As long as HTML is properly encoded, why prevent it showing up in the page tree? The assumption that I never want to see HTML tags in the page tree isn't necessarily true (aside from edge-cases like this one). What if I want to build a system where the client selects an HTML element from a list and represent each element using a page? Then I would have page titles like <strong>, <em> etc and they wouldn't show up at all.

I'd say it's pretty rare for HTML to end up in titles, so if I have a situation where it does, I probably want it to show up in the page tree as well. And if I have a situation where I need HTML in the title but want to strip_tags in the page tree, I can solve that using hooks.

So since it's not required for security, I think the core should make as few assumptions as possible, so I'm all for dropping strip_tags entirely.

matjazpotocnik commented 1 year ago

Looks like $sanitizer->markupToText() removes <= in $value = trim(strip_tags($value));

ryancramerdesign commented 1 year ago

@matjazpotocnik I can't duplicate the issue here, are you able to? Maybe it's already been fixed? Page title of Mini-Jobber (gross <= 450 €/month) shows exactly as that in the page list and in lister.

matjazpotocnik commented 1 year ago

@ryancramerdesign Yes, I can duplicate:

image

image

Page title "Mini-Jobber (gross <= 450 €/month)" is shown as "Mini-Jobber (gross":

image

ryancramerdesign commented 1 year ago

@matjazpotocnik Strange, I can't duplicate it here on the current dev branch.

Screen Shot 2023-07-21 at 1 34 27 PM Screen Shot 2023-07-21 at 1 35 08 PM
matjazpotocnik commented 1 year ago

@ryancramerdesign please disable (remove from the list of textformatters) HTML Entity Encoder on the title field. If entity encoder is on, then < will be replaced with &lt; and all is good.