w3c / specberus

Checker used at W3C to validate the compliance of Technical Reports with publication rules
https://www.w3.org/pubrules/
MIT License
71 stars 44 forks source link

Build Status Coverage Status Dependency Status devDependency Status

Specberus

Specberus is a checker used at W3C to validate the compliance of Technical Reports with publication rules.

  1. Installation
  2. Running
  3. Testing
  4. JS API
  5. REST API
  6. Profiles
  7. Validation events
  8. Writing rules

1. Installation

Specberus is a Node.js application, distributed through npm. Alternatively, you can clone the repository and run:

$ npm install -d

In order to get all the dependencies installed. Naturally, this requires that you have a reasonably recent version of Node.js installed.

2. Running

Currently there is no shell to run Specberus. Later we will add both Web and CLI interfaces based on the same core library.

Syntax and command-line parameters

$ npm start [PORT]

Meaning of positional parameters:

  1. PORT: where Specberus will be listening for HTTP connections. (Default 80.)

Examples:

$ npm start
$ npm start 3001

running specberus

Set the environment variable DEBUG to run in debug mode instead:

$ DEBUG=true npm run start

This modifies the behaviour of certain parts of the application to facilitate debugging. eg, CSS and JS resources will not be loaded in their minified/uglified forms (the web UI will load bootstrap.css, bootstrap.js and jquery.js instead of bootstrap.min.css, bootstrap.min.js and jquery.min.js).

If Specberus is not going to be served from the root directory of a domain, or if it will be served through a proxy, set also BASE_URI pointing to the public root URI of Specberus; eg

$ BASE_URI=https://spec-store.com/check/ npm start
$ BASE_URI=/hostname/can/be/omitted/ npm start 88
  1. Auto reload when developing

Run npm run live when developing. The app will automatically reload when changes happen.

$ npm run live

$ npm run live 3001

3. Testing

1. Simple test

Testing is done using mocha. Simply run:

$ mocha

from the root and you will be running the test suite. Mocha can be installed with:

$ npm install -g mocha

2. SKIP_NETWORK

Some of the tests can on occasion take a long time, or fail outright because a remote service is unavailable. To work around this, you can set SKIP_NETWORK:

$ SKIP_NETWORK=1 mocha

3. Run testserver

The testcase document can run independently

$ npm run testserver

4. Run certain test

Add process env before npm run test and describe.only() to run single test.

// test/rules.js
describe.only('Making sure Specberus is not broken...', () => {

The following example only run test for the http://localhost:8001/doc-views/TR/Recommendation/WD?rule=copyright&type=noCopyright document.

$ RULE=copyright TYPE=noCopyright PROFILE=WD npm run test

The following example run tests to all the documents, but limit to copyright rule and using the noCopyright data.

$ RULE=copyright TYPE=noCopyright npm run test

4. JS API

The interface you get when you require("specberus") is that from lib/validator. It returns a Specberus instance that is properly configured for operation in the Node.js environment (there is nominal support for running Specberus under other environments, but it isn't usable at this time).

(See also the REST API.)

Creating a Validator instance

const { Specberus } = require('specberus');
const specberus = new Specberus(apiKey);
// specberus.validate(...)
// specberus.extractMetadata(...)

validate(options)

This method takes an object with the following fields:

extractMetadata(options)

This method eventually extends this with metadata inferred from the document. Once the event end-all is emitted, the metadata should be available in a new property called meta.

The options accepted are equal to those in validate(), except that a profile is not necessary and will be ignored (finding out the profile is one of the goals of this method).

this.meta will be an Object and may include up to 16 properties described below:

If some of these pieces of metadata cannot be deduced, that key will not exist, or its value will not be defined.

This is an example of the value of Specberus.meta after the execution of Specberus.extractMetadata():

{
    "profile": "WD",
    "title": "Title of the spec",
    "docDate": "2016-2-3",
    "thisVersion": "https://www.w3.org/TR/2016/WD-foobar-20160203/",
    "latestVersion": "https://www.w3.org/TR/foobar/",
    "previousVersion": "https://www.w3.org/TR/2015/WD-foobar-20150101/",
    "editorsDraft": "https://w3c.github.io/foobar/",
    "delivererIDs": [123, 456],
    "editorIDs": [12345],
    "informative": false,
    "process": "https://www.w3.org/2015/Process-20150901/"
}

5. REST API

Similar to the JS API, Specberus exposes a REST API via HTTP too.

The endpoint is <host>/api/. Use either url or file to pass along the document (neither source nor document are allowed).

Note: If you want to use the public W3C instance of Specberus, you can replace <host> with https://www.w3.org/pubrules.

The different endpoints are described below.

version (GET)

Returns the version string, eg 1.5.3.

metadata (GET and POST)

Extract all known metadata from a document; see below for information about the return value.

validate (GET and POST)

Check the document (syntax). Many of the options understood by the JS method validate are accepted.

The special profile auto is also available.

Examples

1. Get API version of Pubrules

curl https://www.w3.org/pubrules/api/version

2. Get metadata of one document.

# GET
curl "https://www.w3.org/pubrules/api/metadata?url=https://example.com/doc.html"

# POST
curl "https://www.w3.org/pubrules/api/metadata" -F "file=@/tmp/foo.html"

Metadata is a bunch of data extracted from the document. It includes the type (profile) of the document, publish date, editors' names, Patent Policy version the document is under, etc...

e.g. https://www.w3.org/pubrules/api/metadata?url=https://www.w3.org/TR/2021/WD-i18n-glossary-20210708/

3. Validate the document using profile: auto

# GET
curl "https://www.w3.org/pubrules/api/validate?url=https://example.com/doc.html&profile=auto"

# POST
curl "https://www.w3.org/pubrules/api/validate" -F "file=@/tmp/foo.html" -F "profile=auto"

Note: The POST method will skip some checks requiring the document to be staged online such as checking if assets in the same folder.

auto profile is the easiest way to validate a document. The validation relies on the automatically extracted data.

The validation result contains both the metadata and the errors/warnings regarding the document.

e.g. https://www.w3.org/pubrules/api/validate?url=https://www.w3.org/TR/2021/WD-i18n-glossary-20210708/&profile=auto

4. Validate the document using manual configs

https://www.w3.org/pubrules/api/validate?url=https://example.com/doc.html&profile=WD&validation=simple-validation&patentPolicy=pp2020

Pubrules supports advanced configs to make the validation more accurate.

Config Explanation Supported value
validation Recursively validate multipart documents no-validation, simple-validation, recursive
informativeOnly If the document is informative true, false
echidnaReady Check that the document is valid for automatic publication with Echidna true, false
patentPolicy Patent Policy version pp2020, pp2004

e.g. https://www.w3.org/pubrules/api/validate?url=https://www.w3.org/TR/2021/WD-i18n-glossary-20210708/&profile=WD&validation=simple-validation

Return values

Methods metadata and validate return a JSON object with these properties:

If there is an internal error, the document cannot be retrieved or is not recognised, or validation fails, both methods would return HTTP status code 400. Also, in the case of validate, success would be false and errors.length > 0.

This is an example of a successful validation of a document, with profile auto:

{
    "success": true,
    "errors": [],
    "warnings": [
        "headers.ol-toc",
        "links.linkchecker",
        "links.compound",
        "headers.dl"
    ],
    "info": [
        "structure.display-only",
        "structure.display-only",
        "structure.display-only",
        "validation.wcag"
    ],
    "metadata": {
        "profile": "WD",
        "title": "Character Model for the World Wide Web: String Matching and Searching",
        "docDate": "2016-4-7",
        "thisVersion": "https://www.w3.org/TR/2016/WD-charmod-norm-20160407/",
        "latestVersion": "https://www.w3.org/TR/charmod-norm/",
        "previousVersion": "https://www.w3.org/TR/2015/WD-charmod-norm-20151119/",
        "editorsDraft": "https://w3c.github.io/charmod-norm/",
        "delivererIDs": [32113],
        "editorIDs": [33573],
        "informative": false,
        "process": "https://www.w3.org/2015/Process-20150901/",
        "url": "https://www.w3.org/TR/2016/WD-charmod-norm-20160407/"
    }
}

When the profile is given by the user (instead of being set to auto), fewer items of metadata are returned.

metadata returns a similar structure, where all values are empty arrays, except for the key metadata which contains the metadata object.

6. Profiles

Profiles are simple objects that support the following API:

A profile is basically a configuration of what to check. You can load a specific profile from under lib/profiles or create your own.

Here follows the current hierarchy of profiles. Each profile inherits all rules from its parent profile. Profiles that are identical to its parent profile, ie that do not add any new rules, are marked too.

7. Validation events

For a given checking run, the event sink you specify will be receiving a bunch of events as indicated below. Events are shown as having parameters since those are passed to the event handler.

8. Writing rules

Rules are simple modules that just expose a check(sr, cb) method. They receive a Specberus object and a callback, use the Specberus object to fire validation events and call the callback when they're done.

The Specberus object exposes the following API that's useful for validation: