kuchiki-rs / kuchiki

(朽木) HTML/XML tree manipulation library for Rust
MIT License
470 stars 54 forks source link

Xml escaped string with script tag. #86

Closed ArtBlnd closed 3 years ago

ArtBlnd commented 3 years ago

Seems newly generated Text node in Script node is generating script to Xml escaped string.

fn create_element(name: &str, att: Option<Vec<(&str, &str)>>) -> NodeRef {
    let att = if let Some(att) = att {
        att.iter()
            .map(|(k, v)| {
                (
                    ExpandedName::new(ns!(), LocalName::from(*k)),
                    Attribute {
                        prefix: None,
                        value: v.to_string(),
                    },
                )
            })
            .collect()
    } else {
        Vec::new()
    };

    let local_name = if name == "script" {
        // I've already replaced local name to cached local name instead creating new local name. but it still does not work.
        local_name!("script")
    } else {
        LocalName::from(name)
    };

    NodeRef::new_element(QualName::new(None, ns!(), local_name), att)
}

let script_el = create_element("script", None);
script_el.append(NodeRef::new_text(include_str!("../script.js" /* non-escaped script! */)));

Inside html5ever. mod.rs#221


fn write_text(&mut self, text: &str) -> io::Result<()> {
let escape = match self.parent().html_name {
Some(local_name!("style")) |
Some(local_name!("script")) |
Some(local_name!("xmp")) |
Some(local_name!("iframe")) |
Some(local_name!("noembed")) |
Some(local_name!("noframes")) |
Some(local_name!("plaintext")) => false,
    Some(local_name!("noscript")) => !self.opts.scripting_enabled,

    _ => true,
};

if escape {
    self.write_escaped(text, false)
} else {
    self.writer.write_all(text.as_bytes())
}

}

jdm commented 3 years ago

Do you have a unit test that demonstrates the problem more clearly?

ArtBlnd commented 3 years ago
1. Create empty document.
2. Create script element
3. Create text node and append to script element
4. Serialize document to string.

Then you can see script in the element is xml escaped.

ArtBlnd commented 3 years ago

In this case, the script text in the element should NOT xml escaped by that html5ever's implementation. it checks parent element is script.

jdm commented 3 years ago

Can you share the actual code that demonstrates this, including the serializing code?

ArtBlnd commented 3 years ago

If you need unit test for reproduce. I'll write new one for you. our test is highly depended on our infrastructure. so I need to write new one.

ArtBlnd commented 3 years ago
#[cfg(test)]
mod tests {
    use kuchiki::traits::*;
    use kuchiki::{parse_html, Attribute, ExpandedName, NodeRef};
    use markup5ever::*;

    #[test]
    fn test() {
        fn create_element(name: &str, att: Option<Vec<(&str, &str)>>) -> NodeRef {
            let att = if let Some(att) = att {
                att.iter()
                    .map(|(k, v)| {
                        (
                            ExpandedName::new(ns!(), LocalName::from(*k)),
                            Attribute {
                                prefix: None,
                                value: v.to_string(),
                            },
                        )
                    })
                    .collect()
            } else {
                Vec::new()
            };

            let local_name = if name == "script" {
                local_name!("script")
            } else {
                LocalName::from(name)
            };

            NodeRef::new_element(QualName::new(None, ns!(), local_name), att)
        }

        let document = parse_html().one("");
        let head = document.select_first("head").unwrap();

        let script = create_element("script", None);
        script.append(NodeRef::new_text("some_script>another_script"));
        head.as_node().append(script);

        assert_eq!(document.to_string(), "<html><head><script>some_script>another_script</script></head><body></body></html>");
    }
}
ArtBlnd commented 3 years ago

Expect result is but <html><head><script>some_script>another_script</script></head><body></body></html> <html><head><script>some_script&gt;another_script</script></head><body></body></html>

jdm commented 3 years ago

I believe the problem is that you're using ns!(), instead of ns!(html) when creating the element.

ArtBlnd commented 3 years ago

Seems it has same issue with ns!(html)

ArtBlnd commented 3 years ago

Ah nevermind. fixed!