Open XAMPPRocky opened 7 years ago
Yes, I would definitely like to have this feature. Unfortunately, this would require major changes in the internals of the crate and I'm not sure what a good design would look like at this point.
Right now Document
has a vector of node::Row
and node::Node
has a reference to the Document
and a usize
index. This means allowing removing/inserting nodes will require some kind of Arena like structure so that removed spots are available for reuse by nodes inserted later. We'll also have to not store a reference to Document
in Node
so that the Document
can be mutated while one of its Node exists. We could have Node
be just an index like the petgraph
crate does but that'll make many current APIs verbose, e.g. document[node].text()
instead of node.text()
. Or we could just go and wrap everything in Rc<RefCell<>>
but I'd like to not do that if at all possible.
I'm open to suggestions!
would it be hard to just blank out the contents of the node? Or have a node::RemovedNode?
[edit] Soft deletes. https://github.com/sbeckeriv/select.rs/commit/da9b2451a54bd2ceeef61a23630cb689d958c44d I did not read all of the code to understand why this is a bad idea. Just proof of concept for my needs. [edit] not working as i would expect
for mut node in &mut document
.find(select::predicate::Name("noscript"))
.borrow_mut()
{
node.delete();
dbg!(node);
}
node shows deleted is true here but when the text() function is called it is not marked as deleted [edit] it might have worked my document wasnt first listed a mut. I moved to a local version that takes the index number of the notes i want and skips them in the text view. https://github.com/sbeckeriv/select.rs/commit/2bb9c9d9edddf4c593ea624c2e2147c92a7f0b08#diff-af08c3181737aa5783b96dfd920cd5ef70829f46cd1b697bdb42414c97310e13R143 i moved the function out of my fork and have a local text.
In BeautifulSoup there is the ability to remove nodes from the scraper, this is valuable for removing certain kinds of text or elements from text.