Closed Robbert closed 8 years ago
The first part of HTML validation is finding the actual HTML files in the repository, basically looking for either all files with extensions that are defined to be text/html
, or simply look for **/*.html
. This already entails two steps: 1) recursively getting the paths of all files and 2) filtering the list of paths against a pattern.
Gulp and Grunt are tools that already nicely support finding files using "glob"-like file patterns.
So, if we'd go down that path, we would need a Gulp or Grunt plugin for HTML linting. One example is grunt-html-lint which uses htmlparser2. That parser advertises with "A forgiving HTML/XML/RSS parser". It would appear to me this is highly unwanted for continuous integration. It might be okay to catch certain errors quickly without much effort to the CPU, but a more rigorous validation is required in the end.
Looking at the package html-lint
now, but first thing I noticed was it requires PhantomJS, which means it probably doesn't run on Wercker out of the box.
The best HTML validator is probably the one written by Henri Sivonen and Mike Smith and it turns out there is a Gulp plugin readily available!
The validator works splendidly, however we probably want to enforce style rules that go beyond simply committing a valid HTML file, such as specifying the text encoding. The next step is to find a linter that can perform these additional checks.
No such linter appears to exist. Let's start with simply loading all *.html
files with jsdom
, and perform a couple CSS queries.
Indeed there aren't any good validators that use jsdom
for querying the DOM. Probably best to write a simple one that checks our specific needs, such as detect a missing meta[charset=utf-8 i]
.
I have seen @kangax doing some great work with his HTML minifier, and I just learned he also worked on a HTML checker based on earlier work on the minifier: https://github.com/kangax/html-lint
text/html
files for<meta charset="UTF-8">
?