Closed Nutomic closed 1 year ago
This doesn't seem like a bug.
Both clean
functions are supposed to accept HTML, not unformatted text, and they produce HTML as their output. You shouldn't be sending markdown through them.
For your usecase, I recommend sending your text through the markdown processor first, then sending its output HTML through ammonia.
That would only work if ammonia respects <pre>
tags and doesnt change anything inside them. Is that the case?
Why would <pre>
tags not change anything inside them? I don't see anything special about them in the html parser [^except] or serialization rules for them. That's why, if you write this HTML:
[^except]: <pre>
tags aren't allowed to be nested within paragraph tags, so they have a special parser rule for closing any unclosed <p>
tags. The important part is that the tokenizer mode isn't changed the same way, for example, <script>
does.
<pre><a href="https://rust-lang.org">Rust</a></pre>
And you get this:
Rust
That might be true in HTML, but when you write a markdown code block like the following, you expect it to be rendered unchanged, which Github indeed does.
<script>alert("xss example");</script>
Or
docker-compose pull && docker-compose up -d
We also have a different, but related problem because we need to escape the same string twice in some cases. This is necessary because each time it is untrusted in a different context. Which results in &
turning to &amp;
.
All in all its really complicated to find a proper solution.
Can you wrap the code blocks in <![CDATA[]]>
?
If I write a literal <script>
tag in GFM, it gets filtered out before being shown in the web page (of course), but it's actually still there in the database, and I can see it in the API.
See the comment thread here: https://api.github.com/repos/rust-ammonia/ammonia/issues/comments/1665839239 In other words, their API deals in raw, unmodified user data with no escaping, and it eventually gets escaped *exactly once*. No double-escaping. (probably; it's proprietary software, so I can't actually check, but that's what it looks like it does)
Im closing this issue because ammonia isnt the right tool for our use case. Instead we are going to manually replace special html characters with escape codes.
The
clean
function in ammonia 3.3 unexpectedly replaces HTML special characters like&
with&
, same for<
,>
and so on. Based on the docs and code, this should only happen with theclean_text
function which Im not invoking at all. In fact I cant tell where these character replacements are happening or how to disable them. Any hints?Edit: To reproduce, the following assertion fails but I would expect it to pass: