Closed domenic closed 4 months ago
Figuring out how to create some version of
RcDom
with these overrides seems likely to be something Rust supports
Not... really, it seems. https://chat.openai.com/share/fa746c1a-403f-4611-bce7-7277e21114d8
An alternative hacky option we could always have is to treat validation as a "lint" step as part of the GitHub actions, as opposed to a feature of the core build process itself. Then we don't need the core build's parser to track errors in detail. Obviously it's less elegant, but potentially an option. Or in between -- the core build simply tracks whether there were errors, and then we do a separate lint/diagnostic pass to describe the errors given there were some.
Either of these might be preferable to fully forking markup5ever_rcdom.
@jeremyroman @domfarolino for review if you all have time.
This fixes #290, by outputting the parse errors encountered by the Rust build step's parser. Previously they were being stored in the
RcDom
instance'serrors
vector, and ignored. Now they are threaded through to the finalio::Result
, and then output bymain()
.The hardest part of this was adding line numbers to the errors. Doing this necessitated creating a wrapper for
RcDom
, which I inventively calledRcDomWithLineNumbers
, which implementsTreeSink
with two methodsparse_error
andset_current_line
given custom behavior, while the other many methods just delegate toRcDom
's implementation.(There doesn't appear to be any better way to do this in Rust, currently. If you search for "rust delegation" you'll find lots of discussion of the problem, and various solutions which work if you can modify the delegated-to class, i.e.
RcDom
in our case. But nothing seems to work if the delegated-to class is in third-party code.)Additionally, this enables
exact_errors
as a parser option, which provides slightly more information in a couple of cases related to character references.Original PR description
I spent some time trying to address https://github.com/whatwg/html-build/issues/290 This sort of works. (Although the code needs a lot of cleanup; probably we want to bubble the errors to the top level instead of printing them immediately.) However, the biggest problem is that there is no line number information. So the output is terrible: ``` Compiling html-build v0.0.0 (/home/domenic/src/html-build) Finished release [optimized] target(s) in 3.14s Running `target/release/html-build` Attributes on an end tag No matching tag to close Error: Custom { kind: InvalidData, error: "Parse errors encountered" } ``` The upstream issue is https://github.com/servo/html5ever/issues/48 . I think this is possible to fix if we do something where we override the `RcDom` sink so that instead of [doing nothing when `set_current_line` is called](https://docs.rs/markup5ever/0.11.0/src/markup5ever/interface/tree_builder.rs.html#238), it stores that information somewhere. And then I guess we'd override [the `parse_error` implementation](https://docs.rs/markup5ever_rcdom/latest/src/markup5ever_rcdom/lib.rs.html#224-226) to use that stored information. Figuring out how to create some version of `RcDom` with these overrides seems likely to be something Rust supports, but I'm running out time for playing with Rust today. If anyone else wants to pitch in, patches are appreciated. /cc @noamr @jeremyroman