Closed hipstermojo closed 1 year ago
When I run the following program:
use kuchiki::traits::TendrilSink;
use kuchiki::NodeData;
fn main() {
let node = kuchiki::parse_html().one("<div>a</div> <span></span>");
node.descendants().filter(|x| match x.data(){
NodeData::Text(t) => t.borrow().trim().is_empty(),
_ => false,
}).for_each(|x| x.detach());
for n in node.inclusive_descendants() {
println!("{:?}", n);
}
}
I get:
NodeRef(Document(DocumentData { _quirks_mode: Cell { value: Quirks } }) @ 0x7fe053405bd0)
NodeRef(Element(ElementData { name: QualName { prefix: None, ns: Atom('http://www.w3.org/1999/xhtml' type=static), local: Atom('html' type=static) }, attributes: RefCell { value: Attributes { map: {} } }, template_contents: None }) @ 0x7fe053406010)
NodeRef(Element(ElementData { name: QualName { prefix: None, ns: Atom('http://www.w3.org/1999/xhtml' type=static), local: Atom('head' type=static) }, attributes: RefCell { value: Attributes { map: {} } }, template_contents: None }) @ 0x7fe0534060c0)
NodeRef(Element(ElementData { name: QualName { prefix: None, ns: Atom('http://www.w3.org/1999/xhtml' type=static), local: Atom('body' type=static) }, attributes: RefCell { value: Attributes { map: {} } }, template_contents: None }) @ 0x7fe053406150)
NodeRef(Element(ElementData { name: QualName { prefix: None, ns: Atom('http://www.w3.org/1999/xhtml' type=static), local: Atom('div' type=static) }, attributes: RefCell { value: Attributes { map: {} } }, template_contents: None }) @ 0x7fe0534061e0)
NodeRef(Text(RefCell { value: "a" }) @ 0x7fe053406280)
NodeRef(Element(ElementData { name: QualName { prefix: None, ns: Atom('http://www.w3.org/1999/xhtml' type=static), local: Atom('span' type=static) }, attributes: RefCell { value: Attributes { map: {} } }, template_contents: None }) @ 0x7fe0534063b0)
Which includes no empty text nodes. Could you provide a complete example that demonstrates the problem you see?
I found that the problem was actually concerning removing nodes in general. The code stops detaching nodes after the first one is detached if you use a for loop. You can just close this issue. Sorry I forgot to close it myself
I’d expect that detaching the current node would interact badly with the tree traversal algorithm used by .descendants()
.
The code stops detaching nodes after the first one is detached if you use a for loop.
Indeed it might even stop iterating entirely, as if it had finished going through the whole (sub)tree.
Consider using .descendants().filter(…).collect::<Vec<_>>()
to first find all the nodes to remove, and only then call .detach()
on them.
The docs for descendants
and friends should probably be improved to mention this surprising behavior and the work-around.
That sounds great. Personally my workaround was to use while loops where I set the next element to be accessed before detaching. I found this to be the only way that worked for me. Especially if deletion wasn't the only thing I was doing in a given section of code. I'm still iffy about using detach on any other loop.
I will soon archive this repository and make it read-only, so this issue will not be addressed: https://github.com/kuchiki-rs/kuchiki#archived
I have tried removing text nodes containing only whitespace using
however, the text nodes still end up being there when I later traverse the nodes. Is there a better way of handling this?