zendframework / zend-escaper

Escaper component from Zend Framework
BSD 3-Clause "New" or "Revised" License
331 stars 30 forks source link

Attribute escaping #21

Open autowp opened 7 years ago

autowp commented 7 years ago

Which requires escaping a large number of characters in attributes? [^a-z0-9,\.\-_] URL's in html looks ugly and are larger than possible

<a href="https&#x3A;&#x2F;&#x2F;www.example.com&#x2F;">
<a href="https://www.example.com/">
Ocramius commented 7 years ago

"Ugly" is not the problem when security-sensitive contexts. Also, most source viewers will already make these attributes simple to read (Firefox does, for example).

As for the size, gzip compression generally deals with it.

autowp commented 7 years ago

That not easy to understand where is security improvements here.

For example, why "dot" is secure character but "semicolon" is not?

As for the size: On my example cyrillic page where escapeHtmlAttr partially used: 68988 bytes - escaped only quotes and angle brackets 83611 bytes - escaped by escapeHtmlAttr (+20%)

Same with gzip 11116 bytes 11790 bytes (+6%)

Indeed, the size is not crucial.

craigfrancis commented 7 years ago

Are you asking to add more characters to the whitelist, so they don't get encoded?

Maybe you could argue that certain characters like ":" don't need to be escaped, but it's easier to have a very small white-list of "known good" characters ([^a-z0-9,\.\-_]), than trying to work out which characters are allowed in each context.


For anyone not familiar with the background... the reason escapeHtmlAttr() encodes more aggressively than escapeHtml() is for non-quoted attributes.

Lets say someone did:

$url = 'https://www.example.com/';
<a href=<?= $escaper->escapeHtmlAttr($url) ?>>

Notice that it does not include quote marks.

This creates the fairly "ugly" output:

<a href=https&#x3A;&#x2F;&#x2F;www.example.com&#x2F;>

What happens if $url was provided by the user (maybe a link to their website), and they set it to:

$url = 'https://www.example.com/ onclick=do_evil_thing';

Without using escapeHtmlAttr(), it would create the perfectly valid:

<a href=https://www.example.com/ onclick=do_evil_thing>

This means they can create an onclick event handler on your website :-)


You could still use escapeHtml() or htmlspecialchars(), but you must make sure your attributes are quoted.

<a href="<?= $escaper->escapeHtml($url) ?>">

So that it creates:

<a href="https://www.example.com/">

Or, if you want to use htmlspecialchars(), don't forget to use it in full:

htmlspecialchars($url, ENT_QUOTES | ENT_SUBSTITUTE, 'utf-8')

PS: Have a look at adding a CSP (Content Security Policy), and set it so that it does not allow unsafe-inline for scripts or styles. This will probably require you to make some changes, but it adds a second line of defence against this problem, where any attributes like onclick would be blocked by the browser.

froschdesign commented 7 years ago

@craigfrancis Thanks for your explanation! I think, this could improve the documentation.

weierophinney commented 4 years ago

This repository has been closed and moved to laminas/laminas-escaper; a new issue has been opened at https://github.com/laminas/laminas-escaper/issues/3.