rust-ammonia / ammonia

Repair and secure untrusted HTML
Apache License 2.0
523 stars 43 forks source link

'html', 'head' and 'body' tags are stripped out even if these are included in the whitelisted tags #183

Open Muntaner opened 1 year ago

Muntaner commented 1 year ago

Minimal example:

    use maplit::hashset;

    let html = "<html><head>head content</head><body><div>test</div></body></html>";

    let tags = hashset!["html", "head", "body"];

    let mut b = ammonia::Builder::default();

    b.add_tags(tags);

    let clean_html = b.clean(html).to_string();
    println!("{}", clean_html);

Output: head content<div>test</div> Expectation: <html><head>head content</head><body><div>test</div></body></html>

Am I overlooking some setting?

medihack commented 1 year ago

Same thing for some other tags, like strong. Any help?

notriddle commented 1 year ago

html, head, and body are more-or-less expected. The HTML is parsed as-if it was a div's innerHTML.

strong shouldn't do that. Can you open a separate issue with a minimized code example?

Muntaner commented 1 year ago

html, head, and body are more-or-less expected. The HTML is parsed as-if it was a div's innerHTML.

Does this mean that it is working as designed (I doubt that, due to the "more-or-less") or is there any plan to support such tags?

Imho it could be very useful. Right now passing a full fledged HTML doc to the library for sanitization is basically unsupported, since it would "break" the original doc.