In beautifulsoup, a html parsing library written in python, there's a method called .select(css_selector_str) it's incredibly useful for html parsing if you have knowledge of css selectors. For example, to print the question titles on stackoverflow:
import requests
from bs4 import BeautifulSoup
html = requests.get("https://stackoverflow.com/questions/tagged/rust?sort=votes&pageSize=50").text
soup = BeautifulSoup(html, "html.parser")
titles_elements = soup.select("div#questions div.summary > h3 > a")
title_text = [el.text for el in titles_elements]
print(title_text)
This prints:
["What are the differences between Rust'sStringandstr?", 'Why are explicit lifetimes needed in Rust?', "Why doesn't println! work in Rust unit tests?", 'How to access command line parameters?', 'How do I print the type of a variable in Rust?' ... (and many more)
The equivalent selector right now would be something like
let iterator = doc.find(And(Name("div"), Attr("id", "questions"))
.descendant(And(Name("div"), Class("summary")))
.child(Name("h3"))
.child(Name("a")));
Would you be open to accepting css selectors as strings or is that out of the scope of this library?
In beautifulsoup, a html parsing library written in python, there's a method called
.select(css_selector_str)
it's incredibly useful for html parsing if you have knowledge of css selectors. For example, to print the question titles on stackoverflow:This prints:
["What are the differences between Rust's
Stringand
str?", 'Why are explicit lifetimes needed in Rust?', "Why doesn't println! work in Rust unit tests?", 'How to access command line parameters?', 'How do I print the type of a variable in Rust?' ... (and many more)
The equivalent selector right now would be something like
Would you be open to accepting css selectors as strings or is that out of the scope of this library?