sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.12k stars 901 forks source link

explore: alternative CSS selector parsers #2560

Open flavorjones opened 2 years ago

flavorjones commented 2 years ago

The CSS selector parser we have is complex, and selector parsing is really a separable concern from Nokogiri proper. It would be nice if we were able to use an existing parser.

(Side note: the generation of XPath from the CSS is a Nokogiri concern, though, since the generated xpath query is often tightly coupled to the version of libxml or the C extension. Perhaps we can spin this off as a separate gem/concern at some point, but it would need to be pluggable to do nokogiri-specific xpath things and I don't feel like that's worth the effort right now.)

Some things to look at that generate ASTs for CSS:

I'd also like to fix some outstanding bugs in the current implementation:

though, note that the behavior changes to fix these bugs probably justify a 2.0 major release, because it's going to break existing apps.

And then I think we can also introduce some new features:

flavorjones commented 2 months ago

I've looked at Crass a bit yesterday and today, but it's not returning a fine-enough-grained AST for selectors; we'd have to use the tokens and implement some sort of parser to make it work.

Looking at syntax_tree-css, it's incomplete but is definitely a well-formed AST for selectors. I've started kicking the tires and making basic improvements to see how far I can take it.

flavorjones commented 2 months ago

PRs against syntax_tree-css to get it to where we can integrate it:

PRs against nokogiri with this goal in mind:

flavorjones commented 1 month ago

More PRs against Nokogiri (broke up #3218 as too-big):

flavorjones commented 1 month ago

I've got a branch where the hand-written parser work is progressing, in case anybody wants to follow along: https://github.com/sparklemotion/nokogiri/tree/2560-start-custom-css-parser