rust-ammonia / ammonia

Repair and secure untrusted HTML
Apache License 2.0
524 stars 43 forks source link

Parsing full html documents #202

Open dev-ardi opened 3 months ago

dev-ardi commented 3 months ago

Fixes https://github.com/rust-ammonia/ammonia/issues/183

HTML5ever supports parsing full documents so let's expose that option.

What are the acceptable tags and attributes that we should support by default?

notriddle commented 3 months ago

Instead of implicitly adding a bunch of new tags when switching mode, perhaps add a new constructor method that creates a builder with the flag turned on? Like this?

/// Create a new parser in "document mode",
/// instead of the default fragment mode.
///
/// In addition to the normal set of allowed tags,
/// this also enables `html`, `head`, `title`,
/// and `body`.
pub fn new_as_document() -> Builder {
    let mut result = Builder::new();
    result.is_document = true;
    result.add_tags(["html", "head", "title", "body"]);
    result
}
dev-ardi commented 3 months ago

Maybe adding the allowed tags/attributes in the same operation as setting is_document is not a good idea because Builder::empty exists.