rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.
Apache License 2.0
186 stars 114 forks source link

html-proofer failed to pick up html validation errors. #141

Closed Raza403 closed 2 years ago

Raza403 commented 4 years ago

Nokogumbo is an upstream dependency of htmlproofer Test Case** https://github.com/Raza403/test

In index.html there are

And build tested with html-proofer is passing here https://travis-ci.com/Raza403/test/builds/120699310

The same code was run through https://validator.nu and following errors are found. 1

The maintainer of htmlproofer said "The parser relies on nokogumbo, so a report should be filed for that repo/project. "

fulldecent commented 4 years ago

To the Nokogumbo team: please advise if there is a better format we can use for this and future bug reports.

stevecheckoway commented 4 years ago

When I run that HTML through Nokogumbo, I get three errors.

irb(main):029:0> doc = Nokogiri.HTML5(html, max_errors: 100)
irb(main):030:0> doc.errors.each { |e| puts(e) }
12:1: ERROR: That tag isn't allowed here  Currently open tags: html, body, div, div, p.
</body>
^
15:1: ERROR: That tag isn't allowed here  Currently open tags: html, body, div, div, p.
<hr>
^
16:1: ERROR: Premature end of file  Currently open tags: html, body, div, div.

^

The HTML standard specifies what must be a parse error. This is an area that is changing and the conformance tests aren't completely correct here. See the following PRs I filed.

At some point, I went through and tried to ensure that Nokogumbo produces exactly one error for each error the standard specifies. If Nokogumbo (really our included Gumbo) has diverged from the HTML living standard, then I'm happy to fix it.

Looking at the errors produced by validator.nu, the first three should be a single error (I think), the fourth is correct, and the fifth is bogus. It's not possible to write HTML in such a way that the parser cannot recover and produce more error messages. In particular, there should be an error message for the end of file with open tags.

Without checking the standard, my inclination here is that Nokogumbo is correct, validator.nu is incorrect, and whatever is causing html-proofer to not emit errors isn't the fault of Nokogumbo.

stevecheckoway commented 4 years ago

Also, https://travis-ci.com/Raza403/test/builds/120699310 doesn't show anything since the bundle install failed for some reason.

fulldecent commented 4 years ago

@Raza403 Please minimize test case to show how htmlproofer is broken, and let's see exactly what htmlproofer is sending to Nokogumbo.

flavorjones commented 2 years ago

Closing. If you think this is still an issue, please open a new Issue at https://github.com/sparklemotion/nokogiri