This has the makings of a great sanitization library but right now it appears to have some vulnerabilities, based on a quick read-through of the clear and well-written code.
To quote the first cheatsheet: Even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into.
This has the makings of a great sanitization library but right now it appears to have some vulnerabilities, based on a quick read-through of the clear and well-written code.
https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.md
To quote the first cheatsheet: Even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into.
It might be useful to develop a test suite based on this: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
For example, escaping only <> isn't enough. OWASP used to have a list (as follows), but now even this isn't sufficient.
Also have a look at how https://github.com/microcosm-cc/bluemonday does it.
This is another OWASP cheat sheet that might be valuable:
https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Input_Validation_Cheat_Sheet.md