voku / anti-xss

㊙️ AntiXSS | Protection against Cross-site scripting (XSS) via PHP
MIT License
680 stars 106 forks source link

False positive 'abc < abcd' #134

Open timmit-nl opened 1 year ago

timmit-nl commented 1 year ago

We conducted a pentest on our software and this was a false positive that came out:

What is this feature about (expected vs actual behaviour)?

If there is a < followed by a-z it is changed by xss_clean to for example: 'test1 < test2' becomes: 'test1 &lt; test2' But 'test1 > test2' will stay 'test1 > test2'

How can I reproduce it?

$test = 'test1 < test2';

$antiXSS = new \voku\helper\AntiXSS();

$testResult = $antiXSS->xss_clean($test);

if($test!==$testResult){
    echo 'failed';
}
if($antiXSS->isXssFound()){
    echo 'false positive';
}

Does it take minutes, hours or days to fix?

I really don't know, If I understand the packages better I maybe could write a fix, but I don't know where to start...

Any additional information?

Loafy-wb commented 11 months ago

I'm having the same issue.

For example <p>test1 < test2</p> is turned into <p>test1 &lt; test2&lt;/p&gt;

It seems to come from the _close_html_callback function and it assumes the < is the start of an HTML tag, so everything after is encoded. So in this case, it thinks there's an HTML tag < test2</p>

Loafy-wb commented 11 months ago

After checking other issues, this appears to be similar to https://github.com/voku/anti-xss/issues/83 but the fix mentioned there does not seem to fix this for me.

With the update in this commit https://github.com/mathiasselleslach/anti-xss/commit/2a65b16ee20b7896e73eac3cd227bc11b1fe3db4, it does change the result slightly for me but it doesn't solve the issue.

Now <p>test1 < test2</p> is turned into <p>test1 < test2&lt;/p>, which is closer but still not fixed

voku commented 11 months ago

I am not sure if we should fix it here. Maybe we can use a dom parser (e.g. https://github.com/voku/simple_html_dom/blob/master/tests/HTML5DOMDocumentTest.php) and auto correct the given html? 🤔

Result of the w3c validation:

Error: Bad character   after <. Probable cause: Unescaped <. Try escaping it as &lt;.