utkarshkukreti / select.rs

A Rust library to extract useful data from HTML documents, suitable for web scraping.
MIT License
971 stars 69 forks source link

Upgraded test framework to version compatible with latest nightlies. #50

Closed gilescope closed 5 years ago

gilescope commented 6 years ago

Added example of how to just get the plain text from a webpage. (E.g. for machine learning sometimes you just want access to plain text).

If there's a neater way to do it, I'd love to see it - I couldn't find a specific root node function on document? Did I miss one, if not it would be nice to call it out as a specific function rather than nth(0).

utkarshkukreti commented 6 years ago

Thanks for the PR!

Node::text() already exists but it doesn't remove script / noscript elements. Is this why you're building your own function? In any case, I don't think this is an example worth having in this repository.

I agree about .nth(0); we should have a Document::root() which returns Node (not Option<Node> since nth(0) will always succeed).

utkarshkukreti commented 5 years ago

speculate has been updated to 0.1.2 in #55.