Closed mjanda closed 1 year ago
When I run the following series of commands:
$ cat > test.html
let foo = "xxx¤xxx";
let bar = Builder::new().clean(&foo);
println!("{:?}", bar);
$ firefox ./test.html
I get this screenshot, showing that Firefox agrees with ammonia's parsing.
Ammonia is supposed to parse its input the same way a Web browser would. If you want to escape special characters, instead of escaping tags, you need to use the clean_text
function instead of clean
.
Ouch. Thank you for explanation. That's unfortunate. I provided simplest test case I could, but my problems actually occured when user pasted link:
Running it through ammonia makes it malformed :-/
From what I've been able to find, optional semicolon is supported only for legacy reasons and current standard makes them mandatory https://html.spec.whatwg.org/multipage/syntax.html#character-references
Would it be possible to support this more strict parsing? Or is it inherited behavior from html library and ammonia can do nothing about it?
I don't think the HTML library is flexible enough to turn individual fixups like this off
I can make it produce validation errors. Would that work?
Alternatively, you could use regexes to run your own tweaks to the HTML before feeding it to Ammonia. The important part is that any bespoke transformations should be done before feeding it into Ammonia; that way, if there's a bug in your transformation code, it won't potentially cause an XSS vulnerability.
Here's a Playground with what I'd do if I wanted to accept user-supplied HTML from a form or something:
I will explore how difficult it would be to first apply my transformations before ammonia. Now, I'm doing it after.
Thank you for explanation. I'll close this issue as it clearly isn't bug and there is no way to easily solve it here.
Hello, it seems like ¤ without ending ; is rewritten to ¤?
Outputs
Document(xxx¤xxx)
Same with £ ->
Document(xxx£xxx)
and probably many more...ammonia 3.3.0