Closed dwightjack closed 3 years ago
After some inputs from my colleagues, I updated the PR and the description to reflect the changes (https://github.com/pa11y/pa11y-ci/pull/129/commits/c5a67c52117b378693fec0dd217ec505ddbc5c81)
This is a great proposal, so after taking a look at the details I've been thinking about how I'd adapt pa11y-ci-reporter-html
to leverage this interface (instead of just post-processing the JSON as it currently does), and in the process had a few thoughts and suggestions:
pa11y
compatible reporters or represent a new unique interface for pa11y-ci
reporters. From my experience, I think the differences dealing with potentially many URLs being scanned make a unique interface valuable (e.g. I know some people use pa11y-ci
to scan thousands of pages in individual runs). To that end, I'd propose that the responsibility for output be left up to reporters and pa11y-ci
not output to the console.
pa11y-ci-reporter-html
, it's pretty straighforward to leverage existing pa11y reporters for a lot of the work (in these cases for CLI and HTML). Otherwise reporter could save their own data as required.pa11y-reporter-*
philosophy, but may make things cleaner for some of the console output questions already asked, and managing output for multiple reports. For example, in the proposed design, and admittedly I spent limited time going through the code and haven't specifically tested, it wasn't immediately obvious how multiple reporters outputting to the console would be managed (I may have missed it).report
object passed with each reporter call.beforeAll
the reporter has access to it initially, so does it need to be passed with each results
call? It doesn't hurt, but I couldn't see an obvious use if it was already passed at the start.afterAll
be updated to include the full JSON report from pa11y-ci
? That would allow reporters to simply post-process the data if that was easier.@joeyciechanowicz I thought I had read somewhere here or on Slack that the master plan was to integrate the reporters into pa11y, and integrate the pa11y-ci functionality into pa11y, so just wondering what that means for this PR (and maybe others)?
@aarongoldenthal thanks for your feedback!
- The answer to this one may depend on whether this is intended to leverage
pa11y
compatible reporters or represent a new unique interface forpa11y-ci
reporters. From my experience, I think the differences dealing with potentially many URLs being scanned make a unique interface valuable (e.g. I know some people usepa11y-ci
to scan thousands of pages in individual runs). To that end, I'd propose that the responsibility for output be left up to reporters andpa11y-ci
not output to the console.
- If people want the console output, they could simply add the applicable CLI reporter (a new pa11y-ci CLI reporter). From the work I've done for #121, and
pa11y-ci-reporter-html
, it's pretty straighforward to leverage existing pa11y reporters for a lot of the work (in these cases for CLI and HTML). Otherwise reporter could save their own data as required.- This is definitely different than the
pa11y-reporter-*
philosophy, but may make things cleaner for some of the console output questions already asked, and managing output for multiple reports. For example, in the proposed design, and admittedly I spent limited time going through the code and haven't specifically tested, it wasn't immediately obvious how multiple reporters outputting to the console would be managed (I may have missed it).
My initial idea was to re-use pa11y reporters interface as much as possible for consistency and to introduce fewer changes, but I understand your point.
To preserve the current behavior we can have a default reporter which outputs to the CLI in the same way pa11y-ci does now (if I am correct is what you already proposed in #121). The user can then change the reporters to say, JSON to just output to a file or JSON + HTML + CLI to output both JSON and HTML as file while still have the CLI output. I believe this is how it works in mocha (or maybe jest).
- In the interest of separation of concerns, I'd consider leaving state management to the individual reporters, and remove the
report
object passed with each reporter call.
I added the reporter map so that it can be used like a sort of accumulator that is unique for every reporter instance. Since reporters are basically static modules, and supposing that in the future we introduce reporters options, this looked as a good solution to avoid sharing the same accumulator in case the same reporter is used multiple times.
Another solution, if we remove the current behavior (ie: if reporters return something log it to the console), might be to use beforeAll as a factory function returning the accumulator. In this way each reporter can set its initial state independently:
function beforeAll() {
// do something...
// this is then passed to results, afterAll, ...
return new Map()
}
- If the complete array of URLs is passed to
beforeAll
the reporter has access to it initially, so does it need to be passed with eachresults
call? It doesn't hurt, but I couldn't see an obvious use if it was already passed at the start.
In some of my experiments I used those URLs as unique key on a weakMap
used as accumulator. But I get they might not be so useful.
- Could
afterAll
be updated to include the full JSON report frompa11y-ci
? That would allow reporters to simply post-process the data if that was easier.
You're right. It makes sense.
I had some time to throw together a quick example to test and had a few additional comments:
pa11y
reporters implement an error
method that passes the execution error if one occurs. I'd look at adding this so action can be taken if required on error. I suggested the places I added it, but didn't have a chance to look at test updates.begin
function is missing from the "Write a custom reporter" section of the README.results
argument passed to the results
function (I know that came from pa11y and the pa11y reporters, but would love to eliminate that duplication) contains the URL being analyzed in results.pageUrl
, so wonder if that could be eliminated as a separate property of the third argument.In looking at this I put together a quick example using pa11y-reporter-html
to write an HTML report file for each URL (obviously not production-ready, but a quick proof of concept). In working with it I quickly wanted to be able to pass reporter-specific configuration, which I think should be considered (ideally I think of something like the jest reporter configuration). It could also be a growth item as pa11y-ci
currently doesn't limit "extra" configuration parameters and passes them as shown in the example below.
Configuration file:
{
"defaults": {
"reporters": [
"reporters/reporter-test.js"
],
"reporterConfig": {
"reporter-test": {
"savePageReport": true,
"saveSummaryReport": true
}
}
},
"urls": [
"https://weather.com/",
"https://domain.does.not.exist/"
]
}
reporters/reporter-test.js
'use strict';
const filenamifyUrl = require('filenamify-url');
const htmlReporter = require('pa11y-reporter-html');
const fs = require('fs');
const path = require('path');
const report = module.exports = {};
report.results = async (results, _, {config}) => {
// If configuration defined, save report based on setting (default: true)
if (config.reporterConfig &&
config.reporterConfig['reporter-test'] &&
!config.reporterConfig['reporter-test'].savePageReport) {
return;
}
const fileName = path.join('reports', `${filenamifyUrl(results.pageUrl)}.html`);
const htmlReport = await htmlReporter.results(results);
fs.writeFileSync(fileName, htmlReport);
};
report.error = (error, _, {url}) => {
// Process error, but probably not like this, instead log error to summary results
console.error(`Error processing '${url}': ${error.message}`);
};
@aarongoldenthal thanks for your feedback. Sorry I wasn't able to reply earlier.
I agree with your suggestions and will update the PR in the next few days.
I have one question about a topic you shared earlier:
In the interest of separation of concerns, I'd consider leaving state management to the individual reporters, and remove the report object passed with each reporter call.
A possible scenario could be that one reporter is used multiple times with different options. Since reporters are module functions how could we manage different instances of the reporter object?
Maybe we could add a createReport
optional method returning the report initial state and attaching it to the reporter instance we generate in the buildReporter
factory function?
@dwightjack That's a good question, I hadn't considered the case where one reporter was used multiple times. I assume that would only be in the case where there were specific reporter options specific to each instance. My initial thought is that the limitation that we hit is that require
caches the results, so when calling a second time the cached result, i.e. the first instance, is returned. But, that cached value can be invalidated, causing require
to run again and create a new instance.
I ran a quick test with the changes below against a local file and that appeared to work, although I'll admit it probably deservers a little more thought.
module.exports = function resolveReporters(reporters) {
return [].concat(reporters).map(reporter => {
if (typeof reporter !== 'string') {
return undefined;
}
try {
delete require.cache[require.resolve(reporter)];
return require(reporter);
} catch (_) {
const localModule = path.isAbsolute(reporter) ?
reporter : path.resolve(process.cwd(), reporter);
if (fs.existsSync(localModule)) {
delete require.cache[require.resolve(localModule)];
return require(localModule);
}
console.error(`Unable to load reporter "${reporter}"`);
return undefined;
}
}).filter(Boolean).map(reporterModule => {
return buildReporter(reporterModule);
});
};
@aarongoldenthal I didn't think about that solution. We could use import-fresh for the task, which should be a little bit more optimized.
A reporter option would be awseome!
@aarongoldenthal @joeyciechanowicz Sorry for the delay in updating this PR, I've pushed the changes we discussed earlier (hope I didn't miss any point):
./libs/reporters/cli
which is loaded by default. buildReporters
logic. If a reporter wants to log something to the console it should implement its own logic (this will make pa11y-ci reporters behave differently from pa11y reporters).For the scenario with the same reporter used multiple times (and with custom options), I thought that maybe exporting as default export a factory function would have been the more intuitive option (instead of using import-fresh
). In this way, give the following 'file-reporter' reporter module:
module.exports = (options) => {
const filename = options.filename
return {
afterAll(report) {
fs.writeFile(filename, ...);
}
}
}
we could configure it to write two files:
// configuration file
{
"defaults": {
"reporters": [
"pa11y-ci/libs/reporters/cli", // preserve the CLI output
["file-reporter", { "fileName": "./my-report.json" }],
["file-reporter", { "fileName": "./my-other-report.json" }]
]
},
"urls": [
...
]
}
The array syntax is the same used by Jest, so it should be enough familiar to developers.
I thought I had read somewhere here or on Slack that the master plan was to integrate the reporters into pa11y, and integrate the pa11y-ci functionality into pa11y, so just wondering what that means for this PR (and maybe others)?
@aarongoldenthal Apologies for the slow reply. Yes, the aim is to merge the reporters into pa11y
. This is so we we can reduce the number or reporters and thus the overhead in releasing future versions. I'm not sure if pa11y-ci
will be merged as well. pa11y
will continue to support reporters in the same way it does currently, just we'll have some special paths for the internal reporters. This work is 100% useful, as provides a way of handling specialised reporter for CI (I'm dreaming of a Grafana one 😁 ).
exporting as default export a factory function would have been the more intuitive option
That gets my vote!
Regarding the issue of console output, I think that pa11y-ci
should not log anything and it's up to a console-reporter
. Though adding a debug
flag to it which logged runtime information would useful (which would then align with pa11y
).
@dwightjack This is great. After using it for a week and throwing together a couple of example reporters it's really straightforward, and I agree the reporter configuration option is very familiar and easy to work with. I did have some other observations (which really are intended not to be critical but to show how useful this is). I hoped to have some code for some of these, but got sidetracked working #132 for GitHub Actions.
JSON
With the reporter interface it seems like the JSON option could/should be converted to a reporter. At the moment you can get yourself into some messy output with reporters and --json
specified since they're treated somewhat separately, but the JSON is written to the console. If everyone agrees that's the right answer I'm happy to help with that (or not). One of the example reporters I created was for JSON, and with no configuration outputs to the console like the current implementation, but there's a fileName
option to save directly to a file.
The other questions with that is what to do with the -j
/--json
CLI option. If it were me I'd remove it and only allow it to be specified as a reporter, but I'm not sure if there are concerns about a breaking change (at the moment with the defaults it doesn't seem like it's breaking, but maybe I missed something). The option could always be left and just set json
as the reporter.
Built-In Modules
Should there be an abbreviated name syntax for the included reporters (e.g. cli
, json
)? Unless I missed it, it looks like the only way to reference them is to point to the file path directly. @joeyciechanowicz, I know you were looking at integrating the reporters into pa11y
, so I wasn't sure if this scenario had already been considered and there was a plan (it seems like that approach should be consistent).
Tests
I didn't see an issue running multiple reporters (and at one point I was running 3, two of which were instances of the same reporter with different configuration), but I didn't see a test to check that all reporters are called if multiple are specified, which seemed valuable.
Reporter Interface
The pa11y
reporter interface include a semver
formatted supports
property to validate compatibility, which got me thinking about it here. I can see pros and cons, especially if pa11y
reporters are used here, but thought I'd throw the idea out.
Default Configuration
In digging through a lot of the pa11y-ci
code earlier for the other issue, it seems like there's some duplication in setting defaults between bin/pa11y-ci.js
and lib/pa11y-ci.js
/lib/helpers/defaults.js
that could be consolidated (nothing broken).
I add this comment for reference. @aarongoldenthal implemented most of the ideas from the previous comment ( https://github.com/pa11y/pa11y-ci/pull/129#issuecomment-826442570) in https://github.com/dwightjack/pa11y-ci/pull/1.
To reply to some of the remaining points:
Tests
I didn't see an issue running multiple reporters (and at one point I was running 3, two of which were instances of the same reporter with different configuration), but I didn't see a test to check that all reporters are called if multiple are specified, which seemed valuable.
I'm going to implement those tests in the next few days. I am not sure if they would fit better as integration of unit tests. I will investigate that.
Reporter Interface
The pa11y reporter interface include a semver formatted supports property to validate compatibility, which got me thinking about it here. I can see pros and cons, especially if pa11y reporters are used here, but thought I'd throw the idea out.
Adding a new property is not a problem. I am wondering if it is really useful since we can use peerDependency
in package.json
for that. :thinking:
in Default Configuration
In digging through a lot of the pa11y-ci code earlier for the other issue, it seems like there's some duplication in setting defaults between bin/pa11y-ci.js and lib/pa11y-ci.js/lib/helpers/defaults.js that could be consolidated (nothing broken).
Agree. Navigating the defaults and how they are merged and used got me confused at first. I will review that now that I am a bit more confident with the codebase.
@aarongoldenthal I added both unit and integration tests for reporters. (https://github.com/pa11y/pa11y-ci/pull/129/commits/1d7010bdf64441fc5d923de7877dc6fb43784155 https://github.com/pa11y/pa11y-ci/pull/129/commits/460c008d57f70ea0451ae188cd15b2c00941eefc) I also updated the json
reporter to ensure that the folder where the file get stored exists and to resolve the fileName
parameter to cwd
(https://github.com/pa11y/pa11y-ci/pull/129/commits/d7a26320af49150a1fb26333786b6042f255f1d7). For this change I used fs.mkdirSync(dirPath, {recursive: true});
with the recursive
option not being supported in Node 8. Anyway I am not sure we should add more code or libraries to support a version of Node that reached End of Life more than 1.5 years ago.
In the end, I didn't refactor the configuration resolution. Maybe it's not so readable but I cannot see a better pattern to make it work with the two usage scenarios (CLI and programmatic) of the library 🤔 .
@dwightjack I made a couple of comments, but overall it looks really good to me. I looked again at the configuration and agree it seems like the best way to handle both scenarios (although it did lead to me comment in the README). I'm also a supporter of deprecating old Node versions, so I wouldn't be worried about Node 8 support (and assuming pally-ci
follows the just-released pa11y
v6 compatibility, it will be Node 12+).
The only question that still lingers in my mind is potentially dropping the --json
CLI argument. I can see both sides, but my vote would be to remove because it now seems inconsistent with specifying other output/reporting. If this PR makes the next pa11y-ci
release, and assuming it will update to pa11y
v6, then it'll be a major release and a good point to do it.
@aarongoldenthal thanks! I've merged your suggestions.
About the --json
flag, if we plan to release this feature in a major release, then I agree we can remove it. If so:
@aarongoldenthal thanks! I've merged your suggestions.
About the
--json
flag, if we plan to release this feature in a major release, then I agree we can remove it.
If it is not too much hassle it would be nice to deprecate the flag at first and remove it only in a major version number change (ie 3.0) as it is a breaking change.
@dwightjack It looks good to me. And I think it makes sense to look at deprecating --json
as a separate PR, which leaves open either option. @joeyciechanowicz?
@joeyciechanowicz thanks to you and @aarongoldenthal for the help and great feedback.
This PR adds support for Pa11y compatible reporters into Pa11y CI.
Why
Pa11y CI reports omit the violations for URLs below the error threshold. In my project, I want to log those violations as well. Instead of modifying the core behavior, I think it's a good idea to add reporters support so that users can use their own strategies to transform, store and analyze the results.
Reporters use the same interface as pa11y reporters with some additions:
report
argument. This object is an instance of a JavaScript Map that can be used to initialize and collect data across each tested URL.beforeAll(urls, report)
andafterAll(urls, report)
optional methods called respectively at the beginning and at the very end of the process with the following arguments:urls
: the URLs array defined in your configreport
: the report objectresults()
method receives a thirdoption
argument with the following properties:config
: the current URL configuration objecturl
: the current URL under testurls
: the URLs array defined in your configChanges
--reporter
CLI option to define a single reporterdefaults.reporters
configuration array to define multiple reporterspa11y-ci ... > my-file.json
)