Open bcaller opened 1 year ago
There are a few cases I've found where feeding the output of ammonia back into ammonia gives a different output.
I'm not sure if this means that the initial output is non-compliant or potentially unsafe or if you'd consider this not a bug.
Also it's possible the bug is entirely within html5ever, I'm not sure.
The first two examples are that entity decoding sometimes produces characters we want to remove or change in the second pass.
The later examples show the sanitizer wanting to move closing tags around.
Anyway, do you think it's worth running ammonia twice, or it's nothing to worry about?
HTML entity -> \r -> \n
\r \n
HTML entity for BOM at start -> BOM at start -> nothing (OK this one I understand because we use the default TokenizerOpts with discard_bom)
TokenizerOpts
discard_bom
! \ufeff! !
Anchor tag hopping around:
<a><table><a> <a><a></a><table></table></a> <a></a><a></a><table></table>
<h1><a><h6></a></h6> <h1><a></a><h6><a></a></h6></h1> <h1><a></a></h1><h6><a></a></h6>
Paragraph tags reproducing:
<p><svg><foreignobject><p> <p><p></p></p> <p></p><p></p><p></p>
There are a few cases I've found where feeding the output of ammonia back into ammonia gives a different output.
I'm not sure if this means that the initial output is non-compliant or potentially unsafe or if you'd consider this not a bug.
Also it's possible the bug is entirely within html5ever, I'm not sure.
The first two examples are that entity decoding sometimes produces characters we want to remove or change in the second pass.
The later examples show the sanitizer wanting to move closing tags around.
Anyway, do you think it's worth running ammonia twice, or it's nothing to worry about?
HTML entity -> \r -> \n
HTML entity for BOM at start -> BOM at start -> nothing (OK this one I understand because we use the default
TokenizerOpts
withdiscard_bom
)Anchor tag hopping around:
Paragraph tags reproducing: