rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.
Apache License 2.0
186 stars 114 forks source link

img width="280px" should throw an Error #147

Closed thisconnect closed 4 years ago

thisconnect commented 4 years ago
<!doctype html>
<meta charset="utf-8">
<title>asdf</title>
<img width="280px" height="120px" src="file.jps" alt="drawing" >

Using html-proofer 3.15.3 it is all good, but validator.w3.org/nu does complain about width="280px"

https://validator.w3.org/nu/#textarea

expected

something like Error: Bad value 280px for attribute width on element img: Expected a digit but saw p instead.

related

https://github.com/gjtorikian/html-proofer/issues/564

stevecheckoway commented 4 years ago

Nokogumbo does not do HTML validation. It parses HTML according to the HTML living standard. In particular, it performs the tokenization and tree construction steps.

Unless I'm forgetting something, those rules will construct the following DOM

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>asdf</title>
  </head>
  <body>
    <img width="280px" height="120px" src="file.jps" alt="drawing">
  </body>
</html>

And I don't believe any point in that construction will be an error.

In particular, there's nothing in those two steps that enforces the content models of HTML elements nor that elements have defined attributes nor that their attributes have valid values. This includes things like head elements requiring a single title element (except in the situations in which it does not).

I hope html-proofer isn't relying on Nokogumbo to do that validation because it never has and is unlikely to do so in the near future, if ever. Although I agree that functionality would be very useful.

thisconnect commented 4 years ago

Unless I'm forgetting something, those rules will construct the following DOM

yes that looks correct.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>asdf</title>
  </head>
  <body>
    <img width="280px" height="120px" src="file.jps" alt="drawing">
  </body>
</html>

Errors on https://validator.w3.org/#validate_by_input and complains that Error: Bad value 280px for attribute width on element img: Expected a digit but saw p instead.

close this issue?