sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.13k stars 897 forks source link

[feature request] HTML5 parser for JRuby implementation #2227

Open flavorjones opened 3 years ago

flavorjones commented 3 years ago

This issue is a placeholder for collaboration with the JRuby community to find a way to provide HTML5-compliant parsing for Nokogiri's JRuby implementation.

2204 provides an HTML5 parser for the CRuby implementation by leveraging the Gumbo parser, implemented in C, and a C extension that is tightly coupled to libxml2. As a result, the Nokogiri::HTML5 module will not be immediately available on JRuby, which uses Xerces in place of libxml2.

The Nokogiri maintainers feel it is important to think about and we hope to work on this in the future. If you're interested in helping with HTML5 support on JRuby, please comment on this issue or ping the maintainers on the mailing list or the Discord channel.

rubys commented 3 years ago

Possible starting point: https://about.validator.nu/htmlparser/