Impact of hint text/help messages on test results

jscholes commented 3 years ago

This came up on an ARIA-AT CG call recently. For some controls, JAWS and macOS VoiceOver speak some associated help/hint text, which in VoiceOver's case is often quite substantial. Help text announcements are enabled by default, so will be heard by human testers. The question is, should these messages contribute to the passing or failing of assertions in any way?

A practical example: on the Mac, the main announcement for an editable combobox doesn't convey editable state, but the hint text does. If we provide testers with an assertion that the editable state was conveyed, should it pass or fail? Note that AFAIK, the help text isn't repeated when querying information about the control e.g. with VO+F3, so the assertion will fail in at least some circumstances.

jscholes commented 3 years ago

Update following Feburary 4th community group meeting: we ideally need more examples of where hint text from a screen reader conveys information considered critical to an assertion passing. In the case of macOS behaviour for combo boxes:

For an editable combo box (https://github.com/w3c/aria-at/issues/340), as noted above VoiceOver only indicates that text can be typed in the control as part of the hint text.
Question: is the role of combo box on the Mac sufficient to convey editable state on its own? I.e. should all controls that are identified by VO as combo boxes be considered to be editable?
Currently, no. The select-only combo box (https://github.com/w3c/aria-at/issues/331) is identified as a combo box in Safari, but a pop-up button in Chrome.
The Chrome behaviour matches that of native select-style widgets.
We should file an issue against the Core Accessibility API Mappings, which currently include a mapping of "AXComboBox" for anything with a combobox role, regardless of whether it is editable or not. If a combo box is select-only, is it reasonable for macOS user agents to present a pop-up button instead?

jscholes commented 3 years ago

Comment from @robfentress:

One issue I had when I was doing testing before was that I wasn't sure what to include when recording the output for VoiceOver. In particular, in the default configuration, VO announces hint text about the item in the VO cursor after a brief delay. I don't remember if we ever decided how to handle this. Would it simplify things if we changed the configuration instructions to instruct testers to turn off hints in the VO Configuration Utlity before conducting testing? Is any information we are looking for in our assertions communicated exclusively through hint text? If hints are to remain enabled in our configuration recommendations, then I think we should probably add something to our instructions telling users to make sure to wait long enough for them to surface.

mcking65 commented 2 days ago

Here is the resolution I propose.

Define hint in the glossary

Hint: Supplemental output provided by a screen reader that is intended to help users understand an element or how to use it. Hints are distinct from base output that includes required information, such as name, role, value, state, and properties. Hints may repeat information that is included in base output. hints that are conveyed with speech are spoken after base output.

Update Testing guidance

Why AARIA-AT does not test hints, i.e., test assertions do not apply to hints?

Hints are supplemental output. While it is very important that hints do not interfere with interoperability, hints are not essential functionality. The scope of ARIA-AT is limited to fostering interoperability of essential functionality.
Because users of certain screen readers commonly disable hints, base output must satisfy interoperability requirements. If a hint could satisfy an interoperability requirement, hints would have to be consider essential instead of supplemental.

why leave hints on during testing?

Testing with default settings is an important principle.
It is important to capture negative side effects caused by hints, e.g., a hint conflicts with base output.

testing requirements:

Because hints are sometimes part of an AT response, they are included in the AT response recorded for a command. The goal is for response collection to be comprehensive; it must include both positive and negative side effects of the command. Good hints an be considered a positive side effect while inaccurate hints can be considered a negative side effect.
Hint output is not analyzed to determine an assertion verdict. Thus, testers must have the skills and knowledge necessary to distinguish hints from base output.

css-meeting-bot commented 2 days ago

The ARIA-AT Community Group just discussed Issue 365: Impact of hints on testing.

The full IRC log of that discussion

<jugglinmike> Topic: Issue 365: Impact of hints on testing
<jugglinmike> Matt_King: I'm trying to thread the needle here, but I'm not sure we can make everybody happy in this situation
<jugglinmike> Matt_King: I am proposing a solution for how to treat hints
<jugglinmike> github: https://github.com/w3c/aria-at/issues/365
<jugglinmike> Matt_King: I wrote a definition of "hint"
<jugglinmike> Matt_King: I'm calling it supplemental output that is provided by the screen reader
<jugglinmike> Matt_King: The full definition I'm proposing: "Supplemental output provided by a screen reader that is intended to help users understand an element or how to use it. Hints are distinct from base output that includes required information, such as name, role, value, state, and properties. Hints may repeat information that is included in base output. hints that are conveyed with speech are spoken after base output."
<jugglinmike> IsaDC: I read it, and I agree with it
<jugglinmike> jugglinmike: Does this definition's focus on screen readers exclude other kinds of ATs? Do other ATs even have a concept of hints?
<jugglinmike> IsaDC: On the iPad, VoiceOver doesn't consistently use hints with Braille displays. In some cases, they disappear very quickly
<jugglinmike> Matt_King: I'm happy to change the wording to be a definition of "screen reader hint" as opposed to any other kind of hint because I do think what we're trying to address is specific to screen readers
<jugglinmike> Matt_King: JAWS calls them "messages". I intentionally chose not to use proprietary language
<jugglinmike> dean: If you wanted to be exact, I think the blanket term would probably be something like supplemental information
<jugglinmike> Matt_King: In our content design at Meta, they're referred to as "hints." I don't know how other people do it
<jugglinmike> dean: I vote for "screen reader hint"
<Joe_Humbert> Hint: Supplemental output provided by a screen reader that is intended to help users understand an element or how to use it. Hints are distinct from base output that includes required information, such as name, role, value, state, and properties. Hints may repeat information that is included in base output. hints that are conveyed with speech are
<Joe_Humbert> spoken after base output.
<jugglinmike> mfairchild: My brain went to "well, the screen reader decides how to provide everything, so everything is provided by the screen reader"
<jugglinmike> Matt_King: We could say, "provided by the screen reader and not the application content"
<jugglinmike> Matt_King: As for testing guidance
<jugglinmike> Matt_King: I've proposed a sort of question--"Why AARIA-AT does not test hints, i.e., test assertions do not apply to hints?"
<jugglinmike> Matt_King: I gave two reasons for why the assertions would not apply to the hints
<jugglinmike> Matt_King: The first is that the hints are supplemental output and that the scope of ARIA-AT is limited to the essential functionality (hence why we have MUST/SHOULD/MAY distinctions)
<jugglinmike> Matt_King: The second (which we probably discussed the most) is that for certain screen readers, it's really common for users for disable hints. If that's so common for screen readers, and if the hint were to provide essential information, then for those users, we would have to say that the hints are not "supplemental" or "optional", and we can't do that
<jugglinmike> Joe_Humbert: I want to head something off--something we're working through at the Google Accessibility Task Force. Certain elements in iOS and Android do not provide a role, they only provide a hint
<jugglinmike> Matt_King: Yes, I recognize that. We're not there yet, though.
<jugglinmike> Matt_King: Some organizations are taking different approaches to deciding whether that is a problem or not
<jugglinmike> Matt_King: At Facebook, we've had really long discussions about that for things like the Facebook New Feed (a really long widget)
<jugglinmike> Matt_King: Because people use that constantly, we wonder whether or not it actually needs a role
<jugglinmike> Matt_King: But it's a fair point, Joe_Humbert
<jugglinmike> Matt_King: When those kinds of things come up, though, maybe we would say conveying the role is optional.
<jugglinmike> Matt_King: We're not doing iOS and Android, yet, but I can't wait until we do!
<jugglinmike> Matt_King: Anyway, those two points I laid out are why I am suggestion why the hint does not contribute to the assertion verdicts
<jugglinmike> Matt_King: The next part of my proposal is about whether we leave hints on during testing
<jugglinmike> Matt_King: I think we should, and I have two reasons for that, too
<jugglinmike> Matt_King: When we capture an AT response, we want that response to be comprehensive--that is, including all side-effects of the command under test, positive and negative
<jugglinmike> Matt_King: We could consider a helpful hint to be a positive or neutral side-effect
<jugglinmike> Matt_King: There's also the automation system to consider
<jugglinmike> Matt_King: Where it would be difficult to omit hint text
<jugglinmike> Joe_Humbert: I often have to edit the text from the automation system; inserting commas which can change how the output is interpreted
<jugglinmike> Joe_Humbert: Also, why are we capturing it in the output if it doesn't have any impact on the assertion?
<jugglinmike> Matt_King: We have to be able to see what the complete response in, in order to render a verdict
<jugglinmike> Matt_King: I wonder if Braille display includes commas...
<jugglinmike> IsaDC: They do not. It's just like the speech
<jugglinmike> Joe_Humbert: Does it separate the pieces, even without a comma?
<jugglinmike> IsaDC: It does, via a double space
<jugglinmike> Matt_King: If I turn on all punctuation in VoiceOver, I don't think it speaks commas
<jugglinmike> Matt_King: I don't know about adding commas. That's an interesting thing
<jugglinmike> Joe_Humbert: It's a separate topic, though. I don't mean to derail this conversation
<jugglinmike> Matt_King: Testing with default settings, except for the two exceptions we've documented
<jugglinmike> First: a setting which is frequently changed by users e.g. quick nav on/off, browse mode vs focus mode, etc.
<jugglinmike> And second: if a particular feature is, be default, expected to be hidden behind a setting
<jugglinmike> s/First:/First exception:/
<jugglinmike> Matt_King: When you take all these things together, you get the testing requirements I proposed
<jugglinmike> First: "Because hints are sometimes part of an AT response, they are included in the AT response recorded for a command. The goal is for response collection to be comprehensive; it must include both positive and negative side effects of the command. Good hints an be considered a positive side effect while inaccurate hints can be considered a negative side effect."
<jugglinmike> Matt_King: This puts an important requirement on all of our testers
<jugglinmike> Matt_King: All of our assertion verdicts are determined by people, and they must have the basic skill to be able to differentiate between basic output and hint
<jugglinmike> Matt_King: In other words, if we go with this proposal, Testers will need a specific exptertise
<jugglinmike> jugglinmike: This expertise seems to preclude the use of Mechanical Turk or of natural language processing
<jugglinmike> Joe_Humbert: I think an AI could be trained to determine hints
<jugglinmike> Matt_King: Or a bot could run the test twice; once with hints enabled and once with hints disabled
<jugglinmike> jugglinmike: If we publish "AT response" as a single string, we will be hiding away the human interpretation that is the designation of "hint text"
<jugglinmike> Matt_King: I agree. It adds some complication, though, and I'm not sure we want to go down that path right now
<jugglinmike> Matt_King: For right now, I think it's going to be important to add a link to the report pages that includes information about interpreting hint text
<jugglinmike> jugglinmike: It's good to think a bit in advance about this, though
<jugglinmike> Matt_King: Right. We could one day extend automation to automatically fill in "basic response" and "hint text" should we track those as separate attributes
<jugglinmike> Matt_King: Okay, back to the proposal on the table
<jugglinmike> Matt_King: I'm not hearing any objections, but I'm also not hearing any elation
<jugglinmike> Joe_Humbert: I just want an answer either way so I can test efficiently and consistently
<jugglinmike> dean: I'm in the same boat
<jugglinmike> Matt_King: I will note that this issue is more than three years old (practically four years old). I was surprised to learn its age when I wrote this up. I'll be happy to resolve it!
<jugglinmike> Matt_King: I'm going to close this proposed solution, but first I will update the wiki (extend the glossary and the testing guidance). We will also create an issue related to linking the reports to some documentation about reading the reports

w3c / aria-at

Impact of hint text/help messages on test results #365

Define hint in the glossary

Update Testing guidance