kuchiki-rs / kuchiki

(朽木) HTML/XML tree manipulation library for Rust
MIT License
470 stars 54 forks source link

Example of re-use of a compiled set of selectors #45

Closed ticky closed 1 year ago

ticky commented 6 years ago

Hi there,

I’m trying to use this library to run a set query on a large number of documents.

I’m parsing the document with kuchiki::parse_html().one(), and I’m stuck on how to actually run the filter against this. I know I can call select on the returned document NodeRef, but I have already gone to the trouble of compiling my selector, and I can’t seem to figure out how to actually make use of it!

I’ve tried using .inclusive_descendants() on my NodeRef, but it then complains of a type mismatch.

I’d love to know what the expected way to approach this is - the language used to describe the Selector objects seems to suggest reusing them in compiled form is intended to be possible!

SimonSapin commented 6 years ago

Hi. Sorry the docs are unclear! They could definitely use a lot of improvement.

Selectors::filter takes an iterator of NodeDataRef<ElementData>> which are references to nodes that are known to be elements (so that element-specific data can be accessed directly). inclusive_descendants returns an iterator of all nodes including text nodes, comments, etc. So you likely need to add .elements() (and use kuchiki::traits::*; to have the appropriate trait in scope) to filter that iterator further.

I’ve added an example in https://github.com/kuchiki-rs/kuchiki/commit/66d8e2bcb65249713e4be57d6e94cc51d4133fe0

ticky commented 6 years ago

Aha! That’ll be the missing piece. Thanks a bunch for the quick response and the clarification! 😄

I find myself wondering why trait methods like that aren’t very discoverable in the documentation; it seemingly requires a bit of digging to identify struct kuchiki::iter::Descendants as being an Iterator<Item = NodeRef> and thus having trait kuchiki::iter::NodeIterator’s methods available to it. This is probably more a question for Rustdoc than for you! 🤔

SimonSapin commented 6 years ago

rustdoc actually fixed this recently https://github.com/rust-lang/rust/pull/52585 so when generating docs with Nightly I see this. But even so, this API merits a lot more explanation in prose.

screenshot from 2018-09-01 01-20-34

ticky commented 6 years ago

Oh that’s excellent news! I think that’ll help a lot.

Ygg01 commented 4 years ago

I'd still like an example for new users. People don't read tests that often, unless they have to.

SimonSapin commented 1 year ago

I will soon archive this repository and make it read-only, so this issue will not be addressed: https://github.com/kuchiki-rs/kuchiki#archived