utkarshkukreti / select.rs

A Rust library to extract useful data from HTML documents, suitable for web scraping.
MIT License
959 stars 69 forks source link

Want method to find sibling nodes #57

Open visig9 opened 5 years ago

visig9 commented 5 years ago

For example. Try to find <p>wanted data</p> node by a related position with a-anchor node.

use select::document::Document;
use select::predicate::Attr;

fn main() {
    let html = r#"
    <p>balabala</p>
    <h2 id="a-anchor">Title</h2>
    <p>wanted data</p>
    "#;

    let document = Document::from(html);
    let h2 = document.find(Attr("id", "a-anchor")).next().unwrap();
    assert_eq!("wanted data", h2.next().unwrap().next().unwrap().text());
}

Please notice I use twice next().unwrap() to skip newline / space node. This approach look like very fragile especially consider the site maintainer may drop the newline node or did some other trivial tweaks.

I can verify all nodes after each next() called manually until hit what I want, but maybe It worth a elegant interface to find a sibling node? like find_sibling_forward(Name("p"))?

Or maybe some document I missed. please let me know :D