kuchiki-rs / kuchiki

(朽木) HTML/XML tree manipulation library for Rust
MIT License
470 stars 54 forks source link

Minor API updates #2

Closed Ygg01 closed 9 years ago

Ygg01 commented 9 years ago

Decided to add some convenience API for following things:

Parsing

It's tedious to write:

     ::parse(Some(html.into()), &arena, Default::default());

Also looking at Nokogiri it uses Html/Xml to differentiate parsers. I assume a similar API would be nice so I propose following syntax:

     Html::from_string(html).parse(&arena);

So basically I separated creating/parsing into two parts and absorbed Default into from_string. First you construct an object from string Html::from_string(html) and then your add arena and it will parse. I think this solution will allow to more easily change internal representation (i.e. what if we change tree to something that doesn't require an arena).

Input functions

For convenience I've added from_string, from_file methods to kuchiki, idea is that people can just pass a path to retrieve and parse a file. Possible future improvements - integrate hyper to retrieve a HTML page or XML file and parse it.

Serialization

I provided ToString implementation for Node<'a>, which will allow any node to be serialized with a simple node.to_string(), instead of writing out:

    let mut serialized = Vec::new();
    serialize(&mut serialized, document, Default::default()).unwrap();
    assert_eq!(String::from_utf8(serialized).unwrap(), r"<!DOCTYPE html>");
    assert_eq!(document.to_string(), r"<!DOCTYPE html>...");

Selectors

I added a css method to Node<'a> which will filter all descendants of said node and return an iterator, for further processing. It's a convenience method for replacing:

    let document = ::parse(Some(html.into()), &arena, Default::default());
    let selectors = ::selectors::parser::parse_author_origin_selector_list_from_str("p.foo").unwrap();
    let matching = document.descendants()
       .filter(|node| node.is_element() && ::selectors::matching::matches(&selectors, node, &None))
       .collect::<Vec<_>>();

with following

    let document = Html::from_string(html).parse(&arena);
    let matching = document.css("p.foo").collect::<Vec<_>>();
SimonSapin commented 9 years ago

Thanks a lot!