What do the score values from the APCA tool relate to?

sdw32 commented 3 years ago

I gather that a higher score is better, but I wondered what these scores relate to? Assuming that the intended outcome is for body paragraphs of text to remain speed readable, it would seem evident that text with a score of 5 ought to be speed readable by more people than text with a score of 4, and so on.

However, I wondered if the boundaries between the scores were arbitrary, or whether they could be related to particular degrees of near vision ability (like critical print size from an MNREAD chart, or the smallest row from the handheld letter chart like the one administered in the Towards better design survey (https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6997).

Ideally, the different scores could be related to the percentage of population who would be able to speed read the text. I would be glad to follow this up further, by whatever means would be appropriate.

Myndex commented 3 years ago

@sdw32 Thank you for commenting.

Conformance scoring is being worked on throughout all of Silver. For Visual contrast, it is described at the Visual Contrast WIKI, which again you were provided with some months ago. Please review.

Ideally, the different scores could be related to the percentage of population who would be able to speed read the text

No, Ideally the scores do as they are intended to do now, in terms of dividing design aspects into testable blocks with specific goals.

Critical fluent reading speed (what you are calling "speed reading") is exemplified in score level 4. All other score levels have specific purposes for the conformance model.

Scores Short Summary

In short, there is no "score 5." "Preferred (5)" is a "preference target" based on classical design guidelines, but it is not normative as it would be challenging to meet in a typical web-based design. Normative guidelines are based on minimum performances. "Preferred" is to present that it is ideal to exceed the normative minimums, as opposed to considering the normative minimums the "target."

Score 4 is the highest normative score (that counts toward the conformance total). The main LUT is based on the research of Dr Lovie-Kitchin, Dr. Legge, and others. See the bibliography.

Score 3 is intended as a "catch" for sites that, despite good design intent, missed in an area slightly, it is not much different than 4.

Scores 1 and 2 are considered "poor" and "deficient", they are degraded levels for catching sites that currently pass under WCAG 2.x, but have readability problems, giving site owners an opportunity to correct design problems without failing them outright, which would create an impediment to the overall acceptance of WCAG 3.0

Thank you,

Andy

Andrew Somers W3 AGWG Invited Expert Myndex Color Science Researcher https://www.myndex.com/APCA/simple

jspellman commented 3 years ago

@sdw32 Thank you for your comment. Project members are working on your comment. You may see discussion in the comment thread and we may ask for additional information as we work on it. We will mark the official response when we are finished and close the issue. The comments in this answer are the opinions of one person and not the consensus of the group, however @Myndex is correct that we are looking at the larger issue of conformance scoring . We will give serious consideration to your comment.

ChrisLoiselle commented 3 years ago

@jspellman can be merged with issue #338 . #339 relates to score values and Andy breaks those down in his response 26 days ago. The larger issue is conformance scoring, which is a Silver issue and not just Visual Contrast of Text.

ChrisLoiselle commented 3 years ago

DRAFT RESPONSE: This issue can be merged with issue #338 . Issue #339 relates to score values and Andy breaks those down in his response. I hope this answers your question, if it does not, please feel free to follow-up. Thank you, Chris

sdw32 commented 3 years ago

Thanks to all the comments so far, I have read this discussion with interest.

In general terms, in some work I did on guidelines (https://www.gs1.org/standards/Mobile-Ready-Hero-Image/1-0) with global standards body GS1, the key starting point was to discuss the 'intended outcome', which was a real world outcome that we intended 'conformant' designs to achieve. At the moment, I'm not sure I understand the real world intended outcome that text passing the 'visual contrast of text' guideline aims to achieve. Without an operationalised specification of this intended outcome, it's difficult to critique the proposed model for assessing text to evaluate whether or not it does what it intends to do.

I was imagining this intended outcome would take the form something like 'text that passes this guideline should be speed readable by the vast majority of people with 20/40 vision (or better)'.

At the moment, I haven't yet managed to gather the intended outcome in real-world terms (rather than comparing against an arbitrary line in the sand).

In more specific terms, at the moment I gather 16px is being proposed as the minimum for normal weight text for a score of 4, and 14px for a score of 3. So, I'm trying to understand a quantifiable real-world benefit that is intended to be achieved by 16px text, which would not be achieved by 15px normal weight text.

This sort of argument would be really helpful to convince everyone that they should design their stuff to meet this guideline, and is much more powerful than something like ... we need to do it because a committee of people drew a line in the sand and said that all websites had to exceed this line.

I hope this discussion is helpful, and I would be glad to discuss this issue this issue on a phone call.

sdw32 commented 3 years ago

I had another thought on this topic, it might ideal if the difference score values (5,4,3 etc.) could be calibrated to a physical printed size of text. So, I think it would be helpful if the scores could be expressed as something like: 3 = at least as easy to read as ordinary newsprint (10pt regular-weight Times New Roman) 4 = at least as easy to read as a large print book (16pt regular-weight Times New Roman)

sdw32 commented 3 years ago

This issue was discussed on a group call, the following represents Sam Waller's minutes from this call

Scoring the readability of text (score 4, score 3, etc.)

We spent considerable time discussing the intended meaning of the outcome scores, as described in the above screenshot. SDW proposed the scores would have much more impact if they could be calibrated to real-world everyday text. Reading ordinary newsprint is a world-wide agreed standard of text, which is typically somewhere between 9 and 11pt times. It's regarded as a kind of text that the vast majority of people can read, but it's not ideal by any means, and people with mild visual impairments are likely to struggle with it. In fact, having difficulty reading ordinary newsprint is often one of the thresholds used to define visual disabilities.

SDW noted that the purest description of font-size on a display screen is CSS PX, which is intended to represent an angular unit of measure so 1 CSS PX=1.278 minutes of visual angle. In order to understand how this relates to the more commonly understood pt sizes for fonts, if necessary to realise that 1 pt is a linear measurement of distance equal to 1/76 inch. the relationship between pt and css px can therefore be obtained theoretically based on an assumed viewing distance, or directly by assuming a particular device.

Sam noted that on his phone 14px Arial was equivalent to about 7pt Arial. This was challenged on the call as to whether this was really true and whether width= device width was properly set. [After the call, Sam verified that for his phone, the ratio of device pixels to screen pixels is 2.75, and the screen PPI is 441. So 1 device pixel =2.75/441= 1/160 inch. SDW believes this is reasonably standard for both android and iPhone. Given that 1 pt=1/76 in, one device pixel is therefore 76/160 =0.45pt. if you join all this together to create the formula directly, you get 1px= ‘device pixels to screen pixels’ * 76 / (screen pixels per inch) pt]

Similar calibrations could be performed for tablets and laptops. There was some discussion on the differentiation between the scores. At the moment, there is very little differentiation between scores, so text will typically either essentially pass score 4 or fail. It was agreed that the equivalence between the scores and physical text sizes on different devices, and the differentiation between the scores needed further review.

bruce-usab commented 3 years ago

I am just noting for some context to this issue thread that with the 3.0 First Public Working Draft (FPWD) each of the five Guidelines had an integer Rating between 0 and 4.

My understanding is that the SAPC/APCA tiers reflect a good-faith effort to help align with this feature of the FPWD.

The FPWD did not reference the SAPC/APCA integer scores. The current 3.0 Editors Draft does not reference the SAPC/APCA integer scores. (Referenced date is 04 August 2021.)

Pinging @mraccess77 — in case LVTF should make a recommendation (which would have to be very soon) for the next publication of 3.0 in TR space.

mraccess77 commented 3 years ago

With the related issue of what the scores means and the limited time between the release of this draft I lean toward queueing this up for further discussion for future drafts.

w3c / silver

What do the score values from the APCA tool relate to? #339

Scores Short Summary