w3c / aria-at-automation-harness

A command-line utility for executing test plans from w3c/aria-at without human intervention using the aria-at-automation-driver
5 stars 3 forks source link

Test results data structure design #10

Closed jugglinmike closed 2 months ago

jugglinmike commented 2 years ago

This project currently emits data collected from executing tests as JSON-formatted text. I'd like to discuss improvements to the structure and content of those reports.

Here's what the data structure currently looks like (expressed using JSDoc-style JavaScript code comments):

/**
 * Result from a test plan.
 * @typedef AriaATCIData.TestPlanResult
 * @property {string} name name of the test plan, defaults to 'unknown'
 * @property {AriaATCIData.Log[]} log debug messages emitted during execution of test plan
 * @property {object[]} tests
 * @property {string} tests[].filepath filepath of file describing the test in the test plan
 * @property {AriaATCIData.Log[]} tests[].log subset of log emitted during this single test
 * @property {AriaATCIData.TestResult[]} tests[].results
 */

/**
 * Result from a single test in a test plan.
 * @typedef AriaATCIData.TestResult
 * @property {number} testId numeric id of a test in a test plan
 * @property {object[]} results permutation of input commands and assertions passing or not passing
 * @property {string} results[].command id of input command sent to system
 * @property {string} results[].expectation description of expected assertion
 * @property {boolean} results[].pass did command pass or not pass expectation
 */

@mzgoddard As described, the two properties named log sound like they may be redundant. Could the "debug messages emitted during execution of test plan" be recreated from the "subset of log emitted during [each] single test"?

Some suggestions:

mzgoddard commented 2 years ago

Could the "debug messages emitted during execution of test plan" be recreated from the "subset of log emitted during [each] single test"?

Yes, currently the larger set can be recreated from the sets contained in the individual tests.

  • Lets add timestamps for the moment that testing started and ended

:+1:

  • Because individual tests can be interrupted for any number of exceptional circumstances, and because those circumstances do not necessarily reflect a failure to meet expectations, results should be modeled with at least three states: "pass", "fail", and "error". The "fail" and "error" states generally need to be further contextualized with a description of the failure or exception, warranting an additional property.

Agreed.

This reminds me an omission in #6. The TestResult in that PR changes to include a set of commands and collected speech output. That change is reflected in the jsdoc at the end of this comment. (Its changed in the agent but not in the jsdoc as of this comment.)

  • We'll eventually want to include information on the system under test (i.e. the names and versions of the operating system, web browser, and screen reader). While we work out the details of how those values are inferred, it might be good to include null values in the data structure, if only as a signal of intent

I imagine this will initially be a configured property. Inference will be nice to reach, but it'll be less work to just instruct the host what is being used. Maybe it should be field reported on test plans that is partly or wholly configurable, and inference could be done by another tool as part of the image.


 /**
  * Result from a single test in a test plan.
  * @typedef AriaATCIData.TestResult
  * @property {number} testId numeric id of a test in a test plan
+ * @property {object[]} commands input commands and the speech emitted
+ * @property {string} commands[].command id of input command sent to system
+ * @property {string} [commands[].output] speech emitted
+ * @property {string[]} [commands[].errors] errors that occured while during command
  * @property {object[]} results permutation of input commands and assertions passing or not passing
  * @property {string} results[].command id of input command sent to system
  * @property {string} results[].expectation description of expected assertion
  * @property {boolean} results[].pass did command pass or not pass expectation
  */
jugglinmike commented 2 years ago

Could the "debug messages emitted during execution of test plan" be recreated from the "subset of log emitted during [each] single test"?

Yes, currently the larger set can be recreated from the sets contained in the individual tests.

What do you think about removing the larger set?

This reminds me an omission in #6. The TestResult in that PR changes to include a set of commands and collected speech output. That change is reflected in the jsdoc at the end of this comment. (Its changed in the agent but not in the jsdoc as of this comment.)

Is it accurate to say that this information is ancillary to the test results? Is there any record of the design process which motivated this feature?

jugglinmike commented 2 years ago

Additionally, the design would be more intuitive if it had stronger differentiation in terminology. The word "result" is currently used to describe three distinct concepts:

In this domain, a "test plan" is a collection of "tests," so maybe it works for "test plan results" to contain multiple "test results." The third use of the term is harder to understand, though.

@mzgoddard why is the test ID a property of the TestResult (i.e. result.tests[0].results[0].testId) rather than a property of the test itself (i.e. result.tests[0].testId)? If we can move it like this:

 /**
  * Result from a test plan.
  * @typedef AriaATCIData.TestPlanResult
  * @property {string} name name of the test plan, defaults to 'unknown'
  * @property {AriaATCIData.Log[]} log debug messages emitted during execution of test plan
  * @property {object[]} tests
  * @property {string} tests[].filepath filepath of file describing the test in the test plan
+ * @property {number} tests[].id numeric id of a test in a test plan
  * @property {AriaATCIData.Log[]} tests[].log subset of log emitted during this single test
  * @property {AriaATCIData.TestResult[]} tests[].results
  */

 /**
  * Result from a single test in a test plan.
  * @typedef AriaATCIData.TestResult
- * @property {number} testId numeric id of a test in a test plan
  * @property {object[]} results permutation of input commands and assertions passing or not passing
  * @property {string} results[].command id of input command sent to system
  * @property {string} results[].expectation description of expected assertion
  * @property {boolean} results[].pass did command pass or not pass expectation
  */

Then the "permutation[s] of input commands and assertions passing or not passing" could be consolidated into a single array, like this:

 /**
  * Result from a test plan.
  * @typedef AriaATCIData.TestPlanResult
  * @property {string} name name of the test plan, defaults to 'unknown'
  * @property {AriaATCIData.Log[]} log debug messages emitted during execution of test plan
  * @property {object[]} tests
  * @property {string} tests[].filepath filepath of file describing the test in the test plan
  * @property {number} tests[].id numeric id of a test in a test plan
  * @property {AriaATCIData.Log[]} tests[].log subset of log emitted during this single test
  * @property {AriaATCIData.TestResult[]} tests[].results
  */

 /**
  * Result from a single test in a test plan.
  * @typedef AriaATCIData.TestResult
- * @property {object[]} results permutation of input commands and assertions passing or not passing
- * @property {string} results[].command id of input command sent to system
+ * @property {string} command id of input command sent to system
- * @property {string} results[].expectation description of expected assertion
+ * @property {string} expectation description of expected assertion
- * @property {boolean} results[].pass did command pass or not pass expectation
+ * @property {boolean} pass did command pass or not pass expectation
  */
ChrisC commented 2 months ago

@jugglinmike since the time of this issue's posting, the TestResult type has changed somewhat significantly. Here's the latest:

/**
 * Result from a single test in a test plan.
 * @typedef AriaATCIData.TestResult
 * @property {number} testId numeric id of a test in a test plan
 * @property {object[]} commands input commands and the speech emitted
 * @property {string} commands[].command id of input command sent to system
 * @property {string} commands[].output speech emitted
 * @property {string[]} commands[].errors errors that occured while during command
 * @property {Record<string, string>} capabilities Information about the system under test
 * @property {object[]} results permutation of input commands and assertions passing or not passing
 * @property {string} results[].command id of input command sent to system
 * @property {string} results[].expectation description of expected assertion
 * @property {boolean} results[].pass did command pass or not pass expectation
 */

One thing I'm noticing from the current version of the output is that the data stored in the TestResult.results array is not incredibly valuable since we're not actually using the pass boolean in any meaningful way (we're currently always setting it to true). It's really just a list of all the test assertions per command. The TestResult.commands array is a bit more interesting to me because that actually stores the screen reader output from the command.

I'm wondering what you think of this option for rewriting this data structure, so that the property names a little bit more meaningful?

/**
 * Result from a single test in a test plan.
 * @typedef AriaATCIData.TestResult
 * @property {number} testId numeric id of a test in a test plan
 * @property {object[]} commands input commands and the speech emitted
 * @property {string} commands[].command id of input command sent to system
 * @property {string} [commands[].output] speech emitted
 * @property {string[]} [commands[].errors] errors that occured while during command
 * @property {Record<string, string>} capabilities Information about the system under test - * @property {object[]} results permutation of input commands and assertions passing or not passing
- * @property {string} results[].command id of input command sent to system
- * @property {string} results[].expectation description of expected assertion
- * @property {boolean} results[].pass did command pass or not pass expectation
+ * @property {object[]} assertions[] permutation of input commands and assertions 
+ * @property {string} [assertions[].command] id of input command sent to system
+ * @property {string} [assertion[].expectation] description of expected assertion
 */

Or maybe the assertion expectations can just be nested under each command:

/**
 * Result from a single test in a test plan.
 * @typedef AriaATCIData.TestResult
 * @property {number} testId numeric id of a test in a test plan
 * @property {object[]} commands input commands and the speech emitted
 * @property {string} commands[].command id of input command sent to system
 * @property {string} [commands[].output] speech emitted
 * @property {string[]} [commands[].errors] errors that occured while during command
+ * @property {string[]} commands[].expectations description of expected assertion
 * @property {Record<string, string>} capabilities Information about the system under test - * @property {object[]} results permutation of input commands and assertions passing or not passing
- * @property {string} results[].command id of input command sent to system
- * @property {string} results[].expectation description of expected assertion
- * @property {boolean} results[].pass did command pass or not pass expectation
 */

Here's some sample output with the last proposal for bundling everything under commands:

commands: [
    {
      command: "ctrl+opt+left ctrl+opt+left",
      output: "128\n            Red             slider \n Red \n You are currently on a selectable text. ",
      expectations: [
          "Role 'slider' is conveyed",
          "Name 'Red' is conveyed",
          "Value '128' is conveyed",
          "Orientation 'horizontal' is conveyed",
          "Minimum value '0' is conveyed",
          "Maximum value '255' is conveyed",
          "Screen reader switched from reading mode to interaction mode",
      ]
    },
    {
      command: "shift+tab",
      output: "128\n            Red             slider\nYou are currently on a slider. To start interacting with the slider, press Control            -Option            -Shift            -Down Arrow. ",
      expectations: [ ... ]
    },
    {
      command: "shift+j",
      output: "128\n            Green             slider\nYou are currently on a slider. To start interacting with the slider, press Control            -Option            -Shift            -Down Arrow. ",
      expectations: [ ... ]
    },
  ]
jugglinmike commented 2 months ago

Thanks for bringing this up, @ChrisC --the design definitely warrants more thought. Two things come to mind: how we describe the observed behavior and how we structure the assertions.

Observed behavior

We moved away from describing AT behavior as "output" a while back. We started using the more generic term "response" in the interest of making the system amenable to supporting assistive tech beyond screen readers. That's why we should probably change the property name commands[].output to commands[].response.

Honestly, that modification alone doesn't sufficiently honor the spirit of the change in terminology. If we only change the property name, we'll still be presuming that every AT response can be encoded as a single string. I can imagine designs which might address this (e.g. "response": { "speech": "128\nRed slider" }), but I'm not convinced I have enough experience to accommodate a "multiple AT type" future. That's why I'm leaning toward simply updating the property name and accepting that we'll need a breaking change when we're ready to support other kinds of AT. What do you think?

Assertion structure

While it's true that we don't use the "pass" property now, I've recently proposed an extension which would make it meaningful... Or something like it, anyway. Today, the Community Group and the app refer to this as a "verdict." true and false feel a little ambiguous in that context, but the strings "pass" and "fail" would eliminate any doubt among interpreters. Also, if we end up following that proposal to the letter, we'll need to support a third value for the "unknown" case--whenever the observed response has never been reported before.

That said, we don't necessarily have to name it in the current design, but I would like to reserve space for it in order to limit the churn from a future extension. Nesting under "commands" seems good to me; maybe we can just use an object.

New proposal

Building on your final design:

 /**
  * Result from a single test in a test plan.
  * @typedef AriaATCIData.TestResult
  * @property {number} testId numeric id of a test in a test plan
  * @property {object[]} commands input commands and the speech emitted
  * @property {string} commands[].command id of input command sent to system
- * @property {string} [commands[].output] speech emitted
+ * @property {string} [commands[].response] speech emitted
  * @property {string[]} [commands[].errors] errors that occured while during command
- * @property {string[]} commands[].expectations description of expected assertion
+ * @property {object[]} commands[].assertions
+ * @property {string} commands[].assertions[].expectation
+ * @property {"pass"|"fail"|null} commands[].assertions[].verdict
  * @property {Record<string, string>} capabilities Information about the system under test - * @property {object[]} results permutation of input commands and assertions passing or not passing
  */

I can take or leave the "verdict" property for now--we'd just always set it to null in today's implementation. Here's the corresponding sample output:

commands: [
    {
      command: "ctrl+opt+left ctrl+opt+left",
      response: "128\n            Red             slider \n Red \n You are currently on a selectable text. ",
      assertions: [
        {
          expectation: "Role 'slider' is conveyed",
          verdict: null
        },
        {
          expectation: "Name 'Red' is conveyed",
          verdict: null
        },
        {
          expectation: "Value '128' is conveyed",
          verdict: null
        },
        {
          expectation: "Orientation 'horizontal' is conveyed",
          verdict: null
        },
        {
          expectation: "Minimum value '0' is conveyed",
          verdict: null
        },
        {
          expectation: "Maximum value '255' is conveyed",
          verdict: null
        },
        {
          expectation: "Screen reader switched from reading mode to interaction mode",
          verdict: null
        }
      ]
    },
    {
      command: "shift+tab",
      response: "128\n            Red             slider\nYou are currently on a slider. To start interacting with the slider, press Control            -Option            -Shift            -Down Arrow. ",
      assertions: [ ... ]
    },
    {
      command: "shift+j",
      response: "128\n            Green             slider\nYou are currently on a slider. To start interacting with the slider, press Control            -Option            -Shift            -Down Arrow. ",
      assertions: [ ... ]
    },
  ]
ChrisC commented 2 months ago

New proposal

Building on your final design: ...

All this makes sense to me @jugglinmike ! I'll go with this new proposal recommendation and will go ahead and include the 'verdict' property and set it to null!