I noticed that this document got parsed by html5ever in a unexpected way:
<html>
<head></head>
<body>
<p>one
<p>two</p>
three
</p>
</body>
</html>
but when the following example:
use markup5ever_rcdom as rcdom;
use rcdom::{NodeData, RcDom};
use html5ever::tendril::TendrilSink;
fn main() {
let source = "<html><head></head><body><p>one<p>two</p>three</p></body></html>";
let dom: RcDom =
html5ever::driver::parse_document(RcDom::default(), Default::default()).one(source);
// Do some processing
let doc = &dom.document;
let root = &doc.children.borrow()[0];
print_tree(root, 0);
if !dom.errors.is_empty() {
println!("\nParse errors:");
for err in dom.errors.iter() {
println!(" {}", err);
}
}
}
fn print_tree(node: &rcdom::Handle, level: usize) {
let padding = format!("{empty: >width$}", empty = "", width = level);
match &node.data {
NodeData::Element {
name,
attrs,
template_contents,
mathml_annotation_xml_integration_point,
} => {
println!(
"{padding}<{}> num_children={}",
&name.local,
node.children.borrow().len()
);
for i in 0..node.children.borrow().len() {
let child = &node.children.borrow()[i];
print_tree(child, level + 1);
}
println!("{padding}</{}>", &name.local,);
}
NodeData::Text { contents } => println!("{padding}{}", contents.borrow().as_ref()),
_ => todo!(),
}
}
This outputs
<html> num_children=2
<head> num_children=0
</head>
<body> num_children=4
<p> num_children=1
one
</p>
<p> num_children=1
two
</p>
three
<p> num_children=0
</p>
</body>
</html>
Parse errors:
Unexpected token
No <p> tag to close
but I expected that three should be contain in a <p> elem, like below:
<head> num_children=0
</head>
<body> num_children=1
<p> num_children=3
one
<p> num_children=1
two
</p>
three
</p>
</body>
</html>
I noticed that this document got parsed by
html5ever
in a unexpected way:but when the following example:
This outputs
but I expected that
three
should be contain in a<p>
elem, like below: