yoshuawuyts / html

Type-safe HTML support for Rust
Apache License 2.0
245 stars 7 forks source link

Parsing HTML #13

Open XAMPPRocky opened 1 year ago

XAMPPRocky commented 1 year ago

Hello, I saw the announcement, and am pretty excited to use this crate and how you're generating these types from the spec itself, as there aren't as many HTML crates that are both strong-typed and capable of dynamic generation. I was wondering though if you have interest in adding a parser implementation so that it's possible to accept HTML input as well as generate output?

The reason I'm interested in this crate having parsing over using existing crates like html5ever, is that html5ever and similar HTML parsing crates are trying to be "browser-grade" HTML parsers, meaning that they accept "Quirks Mode" HTML, and often broken syntax because they're parsing web pages. That's not the type of HTML parsing I'm interested in, I want to be able to parse specification compliant HTML, and provide helpful error messages when it's invalid (this is currently close to impossible with html5ever).

There is a gap in the Rust ecosystem for HTML parsing that follows the specification to be used in tools like compilers, content management systems, servers, where you want to accept well-formed compliant HTML, not just any HTML that will render on a browser. Also it would be nice if it was in this crate, as its types would make it easy to parse not just documents but fragments, and it would remove the need for users to write conversion code from crates like html5ever to your crate.

yoshuawuyts commented 1 year ago

Hey! :wave: Yes, support for parsing would be amazing to have. I haven't researched this much, so maybe naively I assumed that we could leverage html5ever for this. But I trust you if you say that that wouldn't be a great fit. If this is something you'd be interesting in contributing, I'd be happy to work with you to get this landed!