wesaynih / infrastructure

© 2016 The Knights Who Say NIH — Do NOT fork this repository without permission.
0 stars 1 forks source link

Add HTML validation to testing pipeline #13

Closed Robbert closed 8 years ago

Robbert commented 8 years ago

I have seen @kangax doing some great work with his HTML minifier, and I just learned he also worked on a HTML checker based on earlier work on the minifier: https://github.com/kangax/html-lint

Robbert commented 8 years ago

The first part of HTML validation is finding the actual HTML files in the repository, basically looking for either all files with extensions that are defined to be text/html, or simply look for **/*.html. This already entails two steps: 1) recursively getting the paths of all files and 2) filtering the list of paths against a pattern.

Gulp and Grunt are tools that already nicely support finding files using "glob"-like file patterns.

So, if we'd go down that path, we would need a Gulp or Grunt plugin for HTML linting. One example is grunt-html-lint which uses htmlparser2. That parser advertises with "A forgiving HTML/XML/RSS parser". It would appear to me this is highly unwanted for continuous integration. It might be okay to catch certain errors quickly without much effort to the CPU, but a more rigorous validation is required in the end.

Robbert commented 8 years ago

Looking at the package html-lint now, but first thing I noticed was it requires PhantomJS, which means it probably doesn't run on Wercker out of the box.

Robbert commented 8 years ago

The best HTML validator is probably the one written by Henri Sivonen and Mike Smith and it turns out there is a Gulp plugin readily available!

Robbert commented 8 years ago

The validator works splendidly, however we probably want to enforce style rules that go beyond simply committing a valid HTML file, such as specifying the text encoding. The next step is to find a linter that can perform these additional checks.

Robbert commented 8 years ago

No such linter appears to exist. Let's start with simply loading all *.html files with jsdom, and perform a couple CSS queries.

Robbert commented 8 years ago

Indeed there aren't any good validators that use jsdom for querying the DOM. Probably best to write a simple one that checks our specific needs, such as detect a missing meta[charset=utf-8 i].