Open timmit-nl opened 1 year ago
I'm having the same issue.
For example <p>test1 < test2</p>
is turned into <p>test1 < test2</p>
It seems to come from the _close_html_callback
function and it assumes the <
is the start of an HTML tag, so everything after is encoded. So in this case, it thinks there's an HTML tag < test2</p>
After checking other issues, this appears to be similar to https://github.com/voku/anti-xss/issues/83 but the fix mentioned there does not seem to fix this for me.
With the update in this commit https://github.com/mathiasselleslach/anti-xss/commit/2a65b16ee20b7896e73eac3cd227bc11b1fe3db4, it does change the result slightly for me but it doesn't solve the issue.
Now <p>test1 < test2</p>
is turned into <p>test1 < test2</p>
, which is closer but still not fixed
I am not sure if we should fix it here. Maybe we can use a dom parser (e.g. https://github.com/voku/simple_html_dom/blob/master/tests/HTML5DOMDocumentTest.php) and auto correct the given html? 🤔
Result of the w3c validation:
Error: Bad character after <. Probable cause: Unescaped <. Try escaping it as <.
We conducted a pentest on our software and this was a false positive that came out:
What is this feature about (expected vs actual behaviour)?
If there is a < followed by a-z it is changed by xss_clean to for example:
'test1 < test2'
becomes:'test1 < test2'
But'test1 > test2'
will stay'test1 > test2'
How can I reproduce it?
Does it take minutes, hours or days to fix?
I really don't know, If I understand the packages better I maybe could write a fix, but I don't know where to start...
Any additional information?