laminas / laminas-escaper

Securely and safely escape HTML, HTML attributes, JavaScript, CSS, and URLs
https://docs.laminas.dev/laminas-escaper/
BSD 3-Clause "New" or "Revised" License
191 stars 20 forks source link

Inconsistent Behavior for UTF-8 and Other Charsets when Escaping a null value #31

Closed michdingpayc closed 2 years ago

michdingpayc commented 2 years ago

Dear Laminas Team,

When passing a null value to the escaper with its encoding set to UTF-8, the user will see an exception thrown. When passing a null value to the escaper with its encoding set to ISO-8859-1, the user will receive an empty string.

I suspect this has to do with Escaper::toUtf8() method.

It converts to UTF-8 if the given encoding is not UTF-8. If the given encoding is not UTF-8, it does not conversion. Then after any conversion, it checks if the result is valid UTF-8. It does not consider null to be UTF-8 and will throw an exception if the result is null.

However, it uses the Escaper::convertEncoding() method to perform the conversion if the given encoding is not UTF-8. This method relies on mb_convert_encoding which (when given null and "UTF-8" and "ISO-8859-1") will convert the null value to an empty string.

I suspect similar behavior for other encodings being converted to UTF-8 by mb_convert_encoding.

Will you please investigate and resolve the inconsistency?

weierophinney commented 2 years ago

@michdingpayc Can you explain what you feel should be the behavior here? For the case of a null value, should all encoding types result in an empty string, or should they throw exceptions?

With that question answered, we can triage and determine if/how we will proceed.

Ocramius commented 2 years ago

The reason why null still works as input us BC: if it were possible, I would totally enforce strictly requiring string as input 🤔

Meanwhile, are you able to write a test case that highlights the issue with different encodings?

michdingpayc commented 2 years ago

Thank you both for the stupendously quick responses.

It seems my project unconsciously relied on the behavior in which it returned an empty string for ISO-8859-1. Now, migrating the project for UTF-8 encoding, the inconsistency introduces a regression. With this in mind, my feeling is that the desired behavior is for a null value to escape to an empty string because this matches my previous unconscious assumption.

After some further digging, it seems the string type is strictly enforced on the escaping methods, as of 2.8.0. This renders the issues fixed as of release 2.8.0.

You will please forgive me for asking before finding this information. However, it may be good to document this backwards incompatible, introduced in a commit here.

To get an idea of the inconsistency, you may add the following snippet to EscaperTest.php from the 2.7.0 release:


    /**
     * I am simply recycling the provider used above.
     * This should succeed at least once, for when we compare UTF-8 against UTF-8.
     *
     * Note, as of release 2.8.0, these methods require strict typing of string.
     * ref - https://github.com/laminas/laminas-escaper/commit/b877686c78c836ecef905c943595bc30878774fe
     */
    public function testNullReturnsSameValueForUTF8AndOtherSupportedEncodings()
    {
        foreach ($this->supportedEncodings as $encoding) {
            $utf8Escaper = $this->escaper;
            $nonUtf8Escaper = new Escaper($encoding);  // in one data case, this will also be a UTF-8 escaper

            $this->assertEquals($utf8Escaper->escapeCss(null), $nonUtf8Escaper->escapeCss(null));
            $this->assertEquals($utf8Escaper->escapeHtml(null), $nonUtf8Escaper->escapeHtml(null));
            $this->assertEquals($utf8Escaper->escapeHtmlAttr(null), $nonUtf8Escaper->escapeHtmlAttr(null));
            $this->assertEquals($utf8Escaper->escapeJs(null), $nonUtf8Escaper->escapeJs(null));
            $this->assertEquals($utf8Escaper->escapeUrl(null), $nonUtf8Escaper->escapeUrl(null));
        }
    }
Ocramius commented 2 years ago

@michdingpayc thanks for investigating nonetheless! 👍