google / gumbo-parser

An HTML5 parsing library in pure C99
Apache License 2.0
5.16k stars 660 forks source link

Possible XSS in serializers that use gumbo_tag_from_original_text #381

Closed duvduvfb closed 1 year ago

duvduvfb commented 7 years ago

gumbo_tag_from_original_text currently uses isspace to detect illegal whitespaces in tag names.

isspace will match on \v and \r, which are not illegal according to the spec (https://html.spec.whatwg.org/multipage/syntax.html#tag-name-state).

This can result in an XSS that will not be possible in a standard-compliant parser: In the current implementation, gumbo_tag_from_original_text will return script on the unknown element script\v (or script\r).

Serializers relaying on gumbo_tag_from_original_text (such as prettyprint) will transform non-executable <script\v> tags to executable <script> tags.

I had a PR ready with fix + tests but due to legal reason I can't sign the CLA. Let me know if it's OK for someone else to merge it and I'll link to the diff.

To fix this the isspace in gumbo_tag_from_original_text should be replaced with the exact list the spec details, and a test case for <script\v> etc. parsing should be added.