w3c / wcag

Web Content Accessibility Guidelines
https://w3c.github.io/wcag/guidelines/22/
Other
1.13k stars 256 forks source link

Include font weight for color contrast tests #665

Closed alastc closed 5 years ago

alastc commented 5 years ago

An issue raised externally by Kevin Marks, reporting here for tracking and so I'm not relying on my replies stream.

A potential gap in the colour contrast measure is that a "thin" font can pass the contrast measure but be very difficult/impossible to read.

Fonts can be set as thin, e.g. Arial thin, or by the nature of the font being very thin.

Varying by set weight

Taking the setting to start with, Kevin suggested varying the contrast requirement value by how thick the font was set at, for example:

Table of font values thin to extra bold, with AA and AAA values that vary up for thinner, and down for thicker font weights.

Apologies for the image as a table, it came via twitter. It shows the same values for normal and bold as defined in WCAG 2.x AA and AAA. The values then increase for thinner variants and decrease for thicker variants.

Measuring fonts that are thin by nature

Some fonts are simply thin, such as "Wire one" displayed below, or even 'dotted' variants such as Codystar, both displayed at 40pt and 'normal' weight below:

Two words saying regular, one very thin, the other bigger in size but dotted so even harder to discern

Adriannegger suggested taking an average grey value, but wasn't sure it would be of value.

I could see a method where you take a standard word, take the box around that, and take an average of the color. E.g. thick black text on a white background would give you a moderate grey. Then compare that luminance value to the background (white) to come up with a contrast including thickness of font. That would be good if it equates to readability.

Criticisms

I asked a general question about this to font/typography experts and got some interesting responses including:

Mark Bolton:

Contrast in type design is more complex than stroke weight (thickness). Kerning, hinting, relative stroke weight within glyphs, for example, all have a bearing on perceived density. I see the intent. But stroke weight probably isn’t the right measure unfortunately. Too simplistic given the variables.

Adrain:

The assumption here is that more weight == more legible, which is not true. And there is no standard for what “bold” is. There’s a lot of variables outside the math here. Not to be discouraging, but are we attempting to fix an educational problem with clever engineering?

Overall, there's a lot to the legibility of text and we have a basic measure that (in my experience in usability testing) does corelate fairly well to whether people with low-vision will be able to read it.

There can be some odd gaps, such as not accounting for particularly thin fonts (or weights of fonts), it would be useful to close the gap if there's a reasonable way of measuring and testing it.

We could certainly provide advice on this, perhaps in the WAI tutorials. Whether we can update WCAG would depend on how accurately we could come up with a measure/test procedure that would work reasonably well across sites & scenarios.

patrickhlauke commented 5 years ago

as an aside, fonts have lots of other metrics that make them really difficult to categorize/analyze. even the "font size" is meaningless, as it doesn't take into account things like actual x-height, how tall the ascenders/descenders are, etc. it's just a vague and meaningless as "bold" not capturing how "thick" and actual font is...

jake-abma commented 5 years ago

Although it's true lots of fonts would be even better usable by a majority of people (not all...) when they have a higher contrast ratio when slimmer or smaller (say 14px / 12 px) I don't see added value to consider this. The gap is there if you like to abuse the SC but trying to bridge that gap and I do see it will get more muddy and looses more value than it gains.

We already tackled "bold", see: https://github.com/w3c/wcag/issues/341 with the conclusion it's not defined / fixed (same for all other weights) and working with specific thickness (1px, 2px,...) and anti-aliasing ... too complex

WayneEDick commented 5 years ago

I have noticed a very odd phenomenon since I've been playing with spacing. When the spacing is wide enough for me to perceive the words well, the page looks too light. This tends to happen with thinner fonts like Palatino. Georgia and Verdana are not as severe. I think this is a result of font weight but bold interferes with delicut difference is letters from Palatino. The only fonts that seem to work better with bold are monotypes which tend to be too thin in general.

Best, Wayne

mbgower commented 5 years ago

I agree with the overall response, which is that we have a general measure for minimum text contrast levels which hinges on point size, not on other attributes of a font. That relatively simplistic measure has done a pretty good job of guiding page creation towards outcomes which contain text that is more discernible by more of us.

I feel like it's still a designer decision to choose typefaces, or the different weights of a particular font. Even where an illegible typeface is chosen, at least the minimum contrast requirement will ensure it's a bit more perceivable/discernible.

Myndex commented 5 years ago

This is part of the studies and research I am doing for 695 that Alastc just referenced.

One fo the key concerns is contrast change due to antialiasing. Thin fonts will tend to lose contrast due to the antialiasing. Also, thin fonts are harder to resolve especially for the impaired. If the circle of confusions (minimum focus dot size) is more than about 1/2 the stem thickness of the letter T,that font will be generally hard to read regardless of contrast, because the left and right edge blurs will join, occluding the letter - from that point and smaller, legibility rapidly declines.

(CoC of ½ stem width of a glyph as the "critical legibility point" is undergoing studies right now, subject to revision).

patrickhlauke commented 5 years ago

easy testability will, again, be a big factor here. again, we're moving well outside of what current tools do, nor what regular testers can do manually at scale...

Myndex commented 5 years ago

easy testability will, again, be a big factor here. again, we're moving well outside of what current tools do, nor what regular testers can do manually at scale...

Sure, but it doesn't have to be that way.

Myndex commented 5 years ago

On many fonts the stem and the bar are different widths making programatic assessment more difficult. Not to mention that there no "official standard" for how a "big" a font is, the font metrics of some fonts are vastly different.

This probably requires choosing a few representative standard fonts as a "baseline". But even then, different font foundries make the same typeface with font files that render it at different sizes than other foundries!

For each font itself, stem to bar width ratio within the glyph, glyph aspect ratio, antialiasing, kerning, and total filled area all need to be considered.

OF THESE: I think antialiasing is the "critical" factor. I believe there is a ratio that can be defined between stem and bar/arm width (aka stroke) and size relative to pixels where antialiasing requires that a higher contrast be used.

In the following example, the w in "width" in the above paragraph is up to twice as light perceptually at it's normal size, than when zoomed in to enlarge it so that the stroke is two pixels wide.

Normal size: Screen Shot 2019-05-06 at 12 02 52 PM

Zoomed in: Screen Shot 2019-05-06 at 12 03 59 PM

Programatic assessment can probably get at least partway there by analyzing the total filled area vs non-filled (whitespace) of a particular glyph. But the relationships of various stroke thicknesses (stem/bar/arm) within the glyph are also important.

And since we're talking fonts, some font anatomy:

image

alastc commented 5 years ago

Programatic assessment can probably get at least partway there by analyzing the total filled area vs non-filled (whitespace) of a particular glyph. But the relationships of various stroke thicknesses (stem/bar/arm) within the glyph are also important.

In order to create some testable criteria, I think we'd need some kind of score per typeface, either based on it's filled/non-filled ratio, or based on measures of thickness of stem/bar/arm (or whatever someone can come up with).

That way common fonts could be scored (and a table created), and then new fonts could be added fairly easily. In future (Silver timeframe) that could even drive the size aspect as well, e.g. if a typeface scores 50%, it needs to be over 30px use the 3:1 ratio (made up example).

I know I raised this, but it is essentially an external comment so I'd like to get the whole groups assessment and propose a response (to be surveyed and agreed as the official response):


It is currently very difficult to assess font weight as part of contrast, beyond the blunt mechanism that is part of the guidelines already. It does appear that there could be a mechanism to measure font weight that includes sizing and weighting of different fonts. Therefore we will keep this as an idea for future guidelines.

It would require some research and the creation of a tool or at least assessment method for typefaces. The AGWG is unlikely to be able to create that as part of the groups scope, but other people (or members of the group) are encouraged to investigate this.

patrickhlauke commented 5 years ago

In order to create some testable criteria, I think we'd need some kind of score per typeface

this works for "common" fonts. How are auditors/developers supposed to handle any number of custom fonts out there? Or text set with a mistery font on an image, or presented as outlines in an SVG? without some form of automated tool (not currently in existence), this will get very difficult very quickly. While I appreciate the desire for scientific accuracy, this may well lead down a very large sinkhole.

[edit] would it make sense to introduce a level of "subjective / use your judgement" here? or is that kind of approach (which some 2.0 SCs have, and relied of the "8 out of 10 cats would say this is a pass" type argument) not ok anymore?

alastc commented 5 years ago

The idea is that the measures are set, and anyone can measure a font. However, it doesn't have to be done every time, once you have a score for a typeface it should be recorded and doesn't have to be redone.

In any case, I'm suggesting we put this on hold until someone (or organisation) has done the work to create that measure (on the assumption it is possible).

Myndex commented 5 years ago

"Assessing this" has always been the point of having a professional designer — this is an issue as old as design itself.

Hi @patrickhlauke

without some form of automated tool (not currently in existence),

There are such tools, they are called Image Assessment Models. One is Mark Fairchild's at RIT, called ICAM. CIECAM02 is another.

HI @alastc

The AGWG is unlikely to be able to create that as part of the groups scope, but other people (or members of the group) are encouraged to investigate this.

This is a part of what I am researching. As I've stated I consider font size and weight INSEPARABLE from contrast. The are interconnected. ON THAT NOTE, there are new experiments up and I've posted in #695 regarding them. The test DIVs have a variety of sizes, and weight examples.


I like the idea of a "font index," if a method is developed that is reliable, it would in essence put the ball in the court of each foundry, which IMO is ideal.

The big issue is that fonts are complex — some such as blackletter fonts are hard to read at any size or contrast (LOL — I actually designed a font I call "Legible Old English" for this very reason, it's what I used for the titles of the film Southpaw). But it should be possible to create a "criteria", most important are the primary strokes — Stem, Arm, Bar, Stress, Diagonal — and exclude serifs, swashes, and other "more ornamental" aspects when determining "weight" for contrast purposes.

Though there are the cognitive issues too of course. The purpose of serifs for example is to improve readability of dark text on a light page. But on a computer screen, serifs DON'T help, unless it's an ultra high res screen like a retina display, OR the font is large enough the serifs render correctly.

But one thing relating to accessibility of textural content is that on a computer screen, fonts with well balances stroke widths (i.e. the stem and bar are close to the same) render better on most displays (similar to the serif issue).

patrickhlauke commented 5 years ago

"Assessing this" has always been the point of having a professional designer — this is an issue as old as design itself.

@Myndex this is about 3rd party auditors (such as myself and many others on here) going through some other person/company's site to give them a WCAG assessment. Not the platonic ideal of a designer who is trying to do the right thing and spends time making carefully researched choices, but the "here's this site we have, you have two days to tell us if we 'pass WCAG' or not, GO!"

There are such tools, they are called Image Assessment Models. One is Mark Fairchild's at RIT, called ICAM. CIECAM02 is another.

Can I download one now, point it at the font file that a site I've just been given uses, and get back a value that I can just plug into the new contrast algorithm that's being tweaked?

I like the idea of a "font index," if a method is developed that is reliable, it would in essence put the ball in the court of each foundry, which IMO is ideal.

Note that yes, it's always ideal to get big entities to do all the work, but we found that with WCAG (and many other standards-related work) ... it doesn't happen if you just will it to.

alastc commented 5 years ago

I consider font size and weight INSEPARABLE from contrast.

Ok, but we have to work out some measures in order to test things.

Our current association of contrast and font size/weight is:

Making it more nuanced is good, if we have a clear method to do so.

There are such tools, they are called Image Assessment Models. One is Mark Fairchild's at RIT, called ICAM. CIECAM02 is another.

From a quick look at ICAM and CIECAM02 they are aimed at images rather than text (e.g. HDR rendering), and not something you could point at a paragraph of text to work out the readability.

Hopefully that impression is wrong, but I couldn't connect the dots there.

Myndex commented 5 years ago

@Myndex this is about 3rd party auditors (such as myself and many others on here) going through some other person/company's site to give them a WCAG assessment. Not the platonic ideal of a designer who is trying to do the right thing and spends time making carefully researched choices, but the "here's this site we have, you have two days to tell us if we 'pass WCAG' or not, GO!"

Got it, good to know where you're coming from, and I understand your position/viewpoint now, thank you.

There are such tools, they are called Image Assessment Models. One is Mark Fairchild's at RIT, called ICAM. CIECAM02 is another.

Can I download one now, point it at the font file that a site I've just been given uses, and get back a value that I can just plug into the new contrast algorithm that's being tweaked?

Now that I understand the issue as it presents itself for you, not sure in total, but not that I'm aware as far as WCAG standards are concerned.

We were talking apple/oranges before — I'm thinking/discussing in terms of current technology and research, whereas I see what you're talking about is a "reduction to practice" - i.e. practical tools you can use to perform assessments.

Please keep in mind that what I am talking about is the direction the research is heading, to a FUTURE standard (WCAG 3.0 and beyond). The only things I am proposing for WCAG 2.2 involve the currently available tools.

And PLEASE keep in mind that what you are seeing me do right now is early on in the research stage. If this was a feature film, the research would be behind very closed doors, between me and the other filmmakers. I'm not so used to such a public venue at this stage, but I do want to keep it open for all to discuss. Just please understand that I have no intention of proposing any change that is not well founded with a useful and practical implementation.

I realize from where we're standing right now some things may look "impossible" or "unreasonably difficult" but that is always the case in early research. I am a bit surprised at certain gaps in available research, and there are things that need to be explored that have yet to be, especially in terms of "graphically rich webpages".

I like the idea of a "font index," if a method is developed that is reliable, it would in essence put the ball in the court of each foundry, which IMO is ideal.

Note that yes, it's always ideal to get big entities to do all the work, but we found that with WCAG (and many other standards-related work) ... it doesn't happen if you just will it to.

I am a pro-active researcher/developer. I am doing hands on original research, I'm not "willing" anything. As far as large entities — this standard makes it's way into laws, and that is usually one good incentive for "big entities". Moreover, let's say Google adopts such-and-such criteria for Google Fonts, so that becomes what everyone uses, foundries will have to follow suit for competitive reasons.

I really believe this is a "build it they will come" scenario.

Myndex commented 5 years ago

I consider font size and weight INSEPARABLE from contrast.

Ok, but we have to work out some measures in order to test things.

Agreed. But that statement (that I consider them inseparable) is based on my recent experiments and trials. When it comes to computer displays, font size and weight are possibly more intertwined with contrast than any other medium/application.

In feature-film-land (and TV) it's been a substantial issue due to the lack of resolution. 2K (the typical film resolution in the theater) is not enough for good text rendering, so we developed many tricks to make text on screen look acceptable. We are faced with what I consider similar issues here.

But to be honest, some form of "common denominator" for assessing a font is something designers have need for, like, forever.

Our current association of contrast and font size/weight is:

  • Requiring 4.5:1 for text.
  • Requiring 3:1 for text over 19px bold, or 24px regular. Making it more nuanced is good, if we have a clear method to do so.

I'm not as interested in making it more "nuanced" as I am making it robust and uniformly applicable.

There are such tools, they are called Image Assessment Models. One is Mark Fairchild's at RIT, called ICAM. CIECAM02 is another.

From a quick look at ICAM and CIECAM02 they are aimed at images rather than text (e.g. HDR rendering), and not something you could point at a paragraph of text to work out the readability.

Hopefully that impression is wrong, but I couldn't connect the dots there.

Well, they present the science of image assessment. ICAM in particular has a contrast module, and a lot of modules that are intended to assess certain aspects of an image.

The way I would expect them to be implemented would I think involve rasterizing the webpage into an image (ie. make a screenshot) and then using the model to analyze that.

What I am working on now are more basic things that look at CSS values — I.e. color, size, padding — those are trivial to examine programatically.

alastc commented 5 years ago

The way I would expect them to be implemented would I think involve rasterizing the webpage into an image (ie. make a screenshot) and then using the model to analyze that.

That is unlikely to be a good way for doing testing on a per-site basis, as the results would vary by platform (windows/mac often render differently), screen size, and other settings that vary between systems.

That could be the basis for providing a score for a particular typeface under controlled conditions, which could then be used for testing per-site.

It is useful to examine the size & weight (the reason for this thread), but it varies wildly by typeface, I think that needs to be included in some way.

Myndex commented 5 years ago

That is unlikely to be a good way for doing testing on a per-site basis, as the results would vary by platform (windows/mac often render differently), screen size, and other settings that vary between systems.

This is true of literally everything that gets displayed: Screen brightness varies from under 80 nits to over 1200 nits. Color contrast, luminance contrast, images, and graphic elements render differently dependent on screen resolution — serif fonts looks great on a retina display, but not on a 720P HD monitor for instance.

But there are some base standards to use as a lowest common denominator. This is true for Contrast, Fonts, Padding, etc.

That could be the basis for providing a score for a particular typeface under controlled conditions, which could then be used for testing per-site.

This last week I created a few different algorithms which I am currently testing for font contrast and weight assessment, as a tangent to the luminance contrast I am working on in 695. I am applying my knowledge of perception to a reliable, repeatable, robust, perceptually uniform, and most importantly objective assessment of size, weight, and contrast. I've only gone through a half-dozen fonts but I am encouraged by the results thus far.

It is useful to examine the size & weight (the reason for this thread), but it varies wildly by typeface, I think that needs to be included in some way.

Testing a font will be literally as easy as testing color contrast. I'm pretty happy with how the test mechanism(s) is working, though there is some manual data prep that also needs to be automated, but the basics for a useful and rational font assessment are within reach.

As I have stated, all my research on luminance contrast points to font-size, font-weight, and DIV/P padding as equally if not more important for an accessibility criterion for graphically rich content displayed on a computer/phone/tablet monitor.

While there are a lot of different device types, they all share a lot of key characteristics, and the web/apps/computer content in general is "pretty much" a unique & well defined ecosystem. True, things like alternate colorspaces and deeper color are on the horizon, but all this work is still relevant (if not more so) in those cases.

alastc commented 5 years ago

All I'm saying is that content accessibility guidelines need to be based on measures of what the content is (from the 'author'), rather than how it is rendered across hundreds of different scenarios. The measure should strongly correlate with how it is rendered at the other end (like the contrast aspect), but the testing can't be based on that.

Examples:

patrickhlauke commented 5 years ago

Additional point: accounting (or not) for font family cascade/fallback. Do all possible defined fonts need to be checked, or should testing assume the first desired font is the only one that needs to be tested.

Myndex commented 5 years ago

Hi @patrickhlauke

Additional point: accounting (or not) for font family cascade/fallback. Do all possible defined fonts need to be checked, or should testing assume the first desired font is the only one that needs to be tested.

That's a good point — and something that probably should be listed in the standard at this subject develops. I'd think that the standard should be something on the order of:

"The smallest rendered size, weight, and contrast font in a given list of fall-backs shall be the font tested for compliance."

But on this thought: as font family fallback lists are already part of the CSS standard, wouldn't it be useful to extend that ever-so-slightly such that the LAST font in the list would be the "accessible" font, and browsers could send in their HTTP requests a flag that was "prefer accessible fonts" such that the "accessible font" would be served instead of the first font in the list.

This would give the designer some additional control over the look and feel of accessibility, while reducing concerns over some aspects of overall design.

patrickhlauke commented 5 years ago

wouldn't it be useful

possibly, but this goes way beyond the topic at hand: providing baseline tests for web content created here and now, displayed in current browsers...

Myndex commented 5 years ago

Hi @alastc I understand what you're saying, and I think we agree but are using different language to describe the same things (for the most part).

All I'm saying is that content accessibility guidelines need to be based on measures of what the content is (from the 'author'), rather than how it is rendered across hundreds of different scenarios. The measure should strongly correlate with how it is rendered at the other end (like the contrast aspect), but the testing can't be based on that.

In this regard, testing could be based on a lowest common denominator standard was what I was saying. It is fairly easy to define a lowest common denominator because computers are a define closed ecosystem where:

  1. The color model is RGB
    1. The colorspace is sRGB (or color managed for non-sRGB spaces)
    2. Some browsers and operating systems are not color managed and expect content to be in sRGB, and won't adjust non-sRGB content.
  2. Displays are bitmapped (i.e. pixel based).
    1. Bitmapped displays are inherently limited in resolution.
    2. Different browsers and OSes use different default fonts, and different techniques for anti-aliasing and rasterizing text and graphic elements to a bit-mapped output.
    3. Different rasterizing methods can result in different contrasts, sizes of fonts and other elements.
  3. Users have some degree of control:
    1. zoom size is easy.
    2. Font weight, style, design, color is not.
  4. Displays are emissive.
    1. Emissive displays are substantially affected by ambient light.
    2. Different displays have different luminance and contrast capabilities which along with display surface treatment results in varying response to operating conditions & ambient light.
    3. Ambient light can range from near total darkness (night bedroom) where it's not a factor to blazing daylight where even the bright mobile devices may have a hard time competing with the sun. (Depends on screen reflection)[1]
  5. Users have some degree of control over brightness and environment (i.e. moving into the shade to use the device).

Therefore:

Lowest Common Denominator Testing:

THUS: A lowest common denominator test would use strong antialiasing on a low or moderate resolution screen.

THUS: A lowest common denominator test would use something like (for example only) a common screen brightness (160 nit) with a high ambient light (500 lux) and a light colored surround (LRV 70).

Both of these "common denominators" can be emulated in software, and both can be assessed using math, along with the assumptions I've listed. I.e. they can be programatic. Some may be a little harder to implement that the current contrast equation, but the deeper I get into this, the more clear these concepts become.

The font analysis I am working on uses a strong antialiasing effect to balance the assessment and create a weighting especially for thin fonts. The contrast equations I am developing consider the effects of ambient light which the current WCAG equation does not (sorry, but adding 0.05 to both sides of a ratio is not the way to model ambient light!)

— A BRIGHT LINE —

Where the Designer's Responsibility Ends, and the User's Begins

Assuming the user does not use a custom style sheet or special accessibility software, they can or can not do certain things to assist accessibility:

CAN:

CAN'T:

Nevertheless, the few things a user can do (without the extraordinary use of a style sheet) must still be considered by a professional designer. But that designer needs to be particularly well informed/guided on the issues the user can't control like weight, colors, padding, & contrast.

One of the things my experiments revealed about the current WCAG math for contrast is that "it's more or less okay in a completely dark room." That is, with NO ambient light it appear relatively uniform. But under real world (and especially mobile usage) conditions it fails as I have previously outlined.

Thus, arbitrary numbers without empirical basis or a "lowest common denominator" standard in effect are not particularly instructive.

The implications for testing standards are that

Examples:

  • If you rasterise a page on one platform, including the fonts, you will get different results from other platforms (difficult).

Thus the need for a single "Lowest common" as a benchmark from with other systems can be referenced.

  • If you test a color is set at #003366, that will be set at that value across platforms (easy).

No it doesn't., actually. It may be that in the CSS, but it may or may NOT be that by the time it gets to the display. This is dependent on color management, monitor profiles, system color spaces, etc.

  • Rasterising a page on a 1280px wide screen will be very different from a 320px screen (impossible).

Thus the need for a single "Lowest common denominator." And as for testing standards, not impossible. You could test a 1280 rasterization on a 320 screen by rendering 1/4th at a time.

  • Aliasing of fonts will vary dramatically depending on font settings like size & weight (easy but may not correlate strongly).

Font size and weight has nothing to do with antialiasing effects (except that some systems reduce or turn off antialiasing for very small fonts — it's an option in Safari for instance). Different systems and different methods do affect antialiasing. Antialiasing is a function of the system's rasterizer.

  • Aliasing & 'thickness' also depend on the chosen font-face and sites use a lot of fonts from places like typekit which are set to only include certain weights (reduce download), and then the CSS setting is essentially ignored (difficult).

The CSS setting is not ignored, the font-face/import settings all play a part here, and those are part of the CSS. In properly formatted CSS with font imports set, the specified weights are defined. Again, aliasing has nothing to do with the FONT (except for any hinting that may be included in the font), antialiasing is a function of the rasterizer.

  • Similarly, we can test that a font is set as X (e.g. Ariel), but whether that font supports a weight setting of 500 is another matter (difficult).

What weights are supported is (should be) a function of the CSS font import definitions at the head of the CSS sheet(s). It's either there or it isn't, so I'm not sure the point you're getting at. Some fonts like IMPACT or ARIAL BLACK are always "bold" regardless of if you set it weight: "normal";

SUMMARY

Every aspect of a font is defined in a font file. The fonts in any given webpage are used based on the CSS/HTML which specifies what specific font files to use when rasterizing the page. So is parsing the page to see the font-size, declared font-weight, and padding as specified by the webpage designer..

Nevertheless those font files can be tested easily, and it's mostly straight-forward. At the moment, "bold", "normal" "300", "900" have no absolute meaning - 900 is "boldest/blackest" and 100 is "lightest" but how that relates to in terms of how many pixels thick is the stem of the capital "T" for font weight 500 is not in any way standardized, and the variation between font families is enormous. But this can be done, and the means to do so (after the app is completed, LOL) can and should be as easy as testing color contrasts in a contrast checker.

Then the W3C will only need to codify the objective results into the standard.

To summarize what I'm working on:

Font Assessment Example

So in that regard, here is ROBOTO 100, the first at 32 px and the second at 16px. Contrast is MAXIMUM at #FFF and #000:

imagedATOM

Depending on how aggressive the settings are, the algorithm sees:

Screen Shot 2019-05-14 at 7 09 55 PM TOP: 32px BOTTOM: 16px (Roboto 100)

The font is a "pass" at 32px provided the luminance contrast of the color pair is more than X amount (i.e. more than 7:1 as a possible example). The font is a FAIL for impaired users at 16px, even at maximum CSS contrast of FFF/000 antialiasing is rendering the "i" useless with a darkest value of #92C2E8 providing a WCAG contrast of only 1.89. At 16px, this font is a bad choice for all users.

CONFERENCE CALL?

I understand there is a regular conference call for visual accessibility? When and how can I be involved?

Thank you!

Andy

FOOTNOTES:

[1] _(daylight reading has more to do with screen reflectance. The effective total contrast of a screen is 1 + (emitted light / reflected light) . For instance a quality military display with anti-reflective polarizer that lowers surface reflectance and minimizes screen scattering for a screen reflectance of under 0.3%. Given that average sunlight is about 10,000 nits, the 320 nit screen has an effective contrast ratio of 1 + (320 / >.003 x 10,000) = 1 + >10.66 = >11.66 )_. SOURCE: http://www.ruggedpcreview.com/3_slates_motion_j3400.html

Myndex commented 5 years ago

this goes way beyond the topic at hand...

No it doesn't.

The concept of a "prefer accessible fonts" flag, and a specified accessible font in the font fallback list is DIRECTLY RELATED to the topic at hand, actually, which is a discussion of font weight and contrast. A user-selectible flag that forces a fall-back to an accessible font directly relates to font weight and contrast, as in shifting the priority for testing accessibility to a base, set of accessible fonts.

It would make YOUR job easier, FWIW.

patrickhlauke commented 5 years ago

@Myndex ... seriously ... my point is this is not the place to discuss potential changes to browsers and their behaviour. this is the issue tracker for WCAG ...

alastc commented 5 years ago

Hi @Myndex,

I think we are agreeing in terms of purpose, but I think some of the assumptions are a little optimistic (after 20 years of focusing on web development and web standards).

Not everything by any means, just specific things like:

A lowest common denominator test would use strong antialiasing on a low or moderate resolution screen. ... Antialiasing is a function of the system's rasterizer.

I trust you're correct about the system rasterizer, but that means people wouldn’t get consistent results across systems. If someone testing on Windows gets a different result (like this example) a lowest common denominator test is not something we can put onto web developers, but perhaps at the font level with a pre-defined setup.

As another example, this is a screenshot of the same page on two different browsers on my Mac:

a menu with text shown twice; the left version is much thicker than the right version

I like & appreciate the font-assessment example (it makes the problem really clear), I'm just saying that it would make sense to conducted that in a pre-defined setup and publish the results. Anyone can contribute, but for testing a web page you'd be referring to a font 'score' of some kind.

Side note: Users can change the font-face and other factors, we have a guideline aimed at supporting that. However, that is for users prepared to replace fonts and adapt a page, the contrast guidelines are about the intended (authored) view.

What weights are supported is (should be) a function of the CSS font import definitions at the head of the CSS sheet(s). It's either there or it isn't, so I'm not sure the point you're getting at.

That my font import could only include the regular weight, and my CSS could define it as bold. A test that interrogates the CSS will not know that, so the result would not be valid. That's why I defined it as "difficult", it's theoretically possible to also analyse the font import, but not easy (unless you know a method?).

Regarding the font-family cascade/fall-back, if a site is importing a font so that (by default) the browser downloads it, we should consider that the main focus of testing.

If a site puts in an uncommon font first, and backs up with standard fonts, that's a trickier issue, but my first thought is to test with the first font and then the first common font. Open to suggestions there, but generally I find people either stick to standard fonts or import them. (Unless you want to impress the CEO who has a particular font installed ;-) )

HTTP requests a flag that was "prefer accessible fonts"...

Patrick is right (if grumpy, ahem) about the scope. I've been saying profiles would be really helpful for over a decade, but sayin' don't make it so. As these guidelines are aimed at what websites should do, they take the current state of user-agents as the context. If we stick in a user-agent requirement, nothing happens.

I think (others may know better) that the roadmap for that would be:

Having said that, Silver (WCAG.next) will have user-agents in scope to a certain extent, so there may be more options in future.

I'll contact you by email about the conference calls, there's some admin around that and I'd like to make sure it is useful for you, and the group.

Cheers,

-Alastair

Myndex commented 5 years ago

HI @alastc

I think we are agreeing in terms of purpose, but I think some of the assumptions are a little optimistic (after 20 years of focusing on web development and web standards). ...snip... I trust you're correct about the system rasterizer, but that means people wouldn’t get consistent results across systems. If someone testing on Windows gets a different result (like this example) a lowest common denominator test is not something we can put onto web developers, but perhaps at the font level with a pre-defined setup.

Well, perhaps I didn't define the purpose well, a "lowest denominator" standard's first purpose to have a defined benchmark with which to test tools and future standards. A corollary is how the CIE developed colorspaces like XYZ and L*a*b* to be device independent, by using a "standard observer". In that case it is an average of the test subjects used to evaluate color matches in the various perception experiments.

The reason I brought the issue up is the "main" standard is outdated and nearly obsolete: IEC61966-2.1, the standard for sRGB has little relevance to current technology. It specifies a luminance of 80 nits and ambient of 64 lux, which is laughable by today's standards when phones are available at 1200 nits designed to work in daylight at 10,000 lux+. Even the cheapest LCD monitors easily do 200 to 300 nits.

I think really what I am trying to say is that today in May of 2019, there appears to lack a realistic standard that reflects the current gestalt of display technology for computers and mobile devices.

As another example, this is a screenshot of the same page on two different browsers on my Mac:

If I had to guess, is one of them Opera (on the left?) It appears to me that one of them is using a different font for the menu items. The one on the right is using a lighter font for the menu items. I don't think this is a rasterizer issue, I think that one of the browsers is not interpreting the CSS or HTML correctly, or a tag is being used that the browser does not support. I'd have to see the HTML/CSS to determine if that's the issue.

I like & appreciate the font-assessment example (it makes the problem really clear), I'm just saying that it would make sense to conducted that in a pre-defined setup and publish the results. Anyone can contribute, but for testing a web page you'd be referring to a font 'score' of some kind.

Well, part of the idea with the algorithm is specifically to provide a reliable score that "thisFont" at 400 compares to a "set standard" in the font class by a certain value. For "set standard" I am using Helvetica, Times, and related "web safe" fonts. Arial and Helvetica are nearly identical, as are Times and Times New Roman. By "almost identical" what I mean is that if you lay Arial on top of Helvetica in color difference mode, you get a plain black screen (slash added to show underlying text):

HelveticaArial

In an image editor's difference mode, when two identical images are overlaid the screen goes completely black. Here a web DIV containing Helvetica text at 16px is overlaid by an identical DIV that's using Arial. The only little bits you really see are wher ethe antialiasing is slightly different, and of course where the words "Helvetica" and "Arial" overlap.

I mention this as Arial is on windows machines and Helvetica is more likely on a Mac or Linux machine, but often NOT on Windows. But for the purposes of size and weight they are essentially identical. The same is true for Tiimes and Times New Roman.

Thus there are two very standard font types that can provide a useable benchmark for size and weight that all other fonts can be measured against. It should be noted that different foundries may have some of their own differences, nevertheless, at the moment I am looking at: **

Humanist & geometrical fonts like Helvetica don't really have a unique italic version, so it's not that instructive. Serif fonts like Times though do have an italic version that is markedly different from the regular typeface.

There may be a reason to add a script font into the mix, but the idea is to keep that list very very small, as its purpose is a clear and defined frame of reference for weight and size within the line-height. (On that last point, a "16px" font is normally 12 px high, with 4px of line-space with the tallest glyphs such as capital letters).

Side note: Users can change the font-face and other factors, we have a guideline aimed at supporting that. However, that is for users prepared to replace fonts and adapt a page, the contrast guidelines are about the intended (authored) view.

Yes, using style sheets, I am aware of that, I said in the post "assuming they don't use a style sheet." Because doing that level of customization is cumbersome, I'd imagine only those with no alternative spend the effort.

What weights are supported is (should be) a function of the CSS font import definitions at the head of the CSS sheet(s). It's either there or it isn't, so I'm not sure the point you're getting at. That my font import could only include the regular weight, and my CSS could define it as bold. A test that interrogates the CSS will not know that, so the result would not be valid. That's why I defined it as "difficult", it's theoretically possible to also analyse the font import, but not easy (unless you know a method?).

Hmmm. Well the font import is how a certain font weight is setup for the rest of the CSS, so there it states a weight such as 400, and then assigns a CSS name like "normal."

@font-face {
          font-family: avenir;
          src: url('/FONTS/Avenir/AvenirLTStd-Light.otf');
          font-weight: 100;
          font-style: normal;
        }

So here I took the font that is named "Light" and assigned it the CSS weight of 100. This is at the head of the CSS sheet.

Regarding the font-family cascade/fall-back, if a site is importing a font so that (by default) the browser downloads it, we should consider that the main focus of testing. If a site puts in an uncommon font first, and backs up with standard fonts, that's a trickier issue, but my first thought is to test with the first font and then the first common font. Open to suggestions there, but generally I find people either stick to standard fonts or import them. (Unless you want to impress the CEO who has a particular font installed ;-) )

There are some old browsers (prior to 2011) that don't support the CSS3 font import paradigm, but it's pretty widely supported (Opera Mini does not support it at all). Any uncommon font should be imported, and there are very few fonts that are common enough that they don't need an import (Helvetica/Arial and Times/TimesNewRoman in standard weights - but weights other than normal and bold need to be imported even for these common fonts).

HTTP requests a flag that was "prefer accessible fonts"...

Patrick is right (if grumpy, ahem) about the scope. I've been saying profiles would be really helpful for over a decade, but sayin' don't make it so. As these guidelines are aimed at what websites should do, they take the current state of user-agents as the context. If we stick in a user-agent requirement, nothing happens.

I do understand that, but the idea is sound and fairly trivial to implement — and W3C has meetings and conferences and working groups developing CSS4 etc. I realize it's another distant branch of affairs, but it's also part of the same organization. Yes it requires a plug-in or browser that incorporates it. But I like your idea of authoring a plug-in first.

I think (others may know better) that the roadmap for that would be:

  • Create a prototype, e.g. browser plugin plus some form of server-side code to demonstrate the concept.
  • Post about it on discourse.wicg.io.
  • Hopefully get some (browser) implementor interest in standardising as part of HTML/CSS.
  • Loop back around to WCAG to create a guideline to enforce that.

CSS4 has things in it that are yet to be realized or supported by any browser, some came about from a conference in San Francisco. But I understand that much of the W3C standards are pushed by browser developers.

I'd think Apple would be a good candidate for adoption as they have always had a strong policy toward accessibility,.

Having said that, Silver (WCAG.next) will have user-agents in scope to a certain extent, so there may be more options in future.

I'll contact you by email about the conference calls, there's some admin around that and I'd like to make sure it is useful for you, and the group.

Thank you Alastair!

Andy

alastc commented 5 years ago

It appears to me that one of them is using a different font for the menu items.

The left is Firefox, the right is Edge (on Mac), but you get similar results from Chrome on windows as well. It isn't a very good font, but it's an example that came up for rendering differences.

Hmmm. Well the font import is how a certain font weight is setup for the rest of the CSS, so there it states a weight such as 400, and then assigns a CSS name like "normal."

Sure, that's how it's supposed to work, but in the CSS I can still set text using that font to bold (and in some cases it even works, not sure why, haven't dug into that).

I do understand that, but the idea is sound and fairly trivial to implement — and W3C has meetings and conferences and working groups developing CSS4 etc

There isn't a CSS4, but the general point is that ideas may be raised and added to drafts, but nothing gets published (as a recommendation, 'rec') until it's in two different implemetations.

it isn't so much pushed by browser devs (although some features might be), but they are the implmentators, and implentations precede standardisation. See posts from Alex Russell.

The crux being that ideas should work for end-users, and implementors need to be convinced of that case. Working code is very helpful in that.

Myndex commented 5 years ago

The left is Firefox, the right is Edge (on Mac), but you get similar results from Chrome on windows as well. It isn't a very good font, but it's an example that came up for rendering differences.

Can I get that URL?, I'd like to investigate the cause.

There isn't a CSS4, but the general point is that ideas may be raised and added to drafts, but nothing gets published (as a recommendation, 'rec') until it's in two different implemetations.

CSS level 4, which everyone types as CSS4... :)

I have been reading the notes in level4 and noticed references to "things to add as a result of San Francisco". is what I was referring to, such as "working spaces" and a few other things yet to be implemented etc.

it isn't so much pushed by browser devs (although some features might be), but they are the implmentators, and implentations precede standardisation. See posts from Alex Russell.

The crux being that ideas should work for end-users, and implementors need to be convinced of that case. Working code is very helpful in that.

Makes sense. Well my plate is overfull so the flag for an accessible font extension will be way on the back burner unless the idea catches the attention of an extension developer...

alastc commented 5 years ago

The URL was above in the bit you quoted, see "same page".

Myndex commented 5 years ago

The URL was above in the bit you quoted, see "same page".

Hey @alastc here is the likely issue, relating to "main.css"

  1. You have defined your font import, only listing a single entry for "normal" weight (which is 400)
  2. The nav styles set the font-weight to 700

Screen Shot 2019-05-15 at 9 49 59 AM : It sorta works for Safari and Chrome on Mac, as those browsers create a "synthetic font" when there is no font-file available with those specific attributes. Same as when you set italic, and the imported font does not include an italic style — the browser modifies the fonts it DOES have. And it THIS case, Safari makes a "synthetic BOLD" font differently than Edge does.

To show this, I turned OFF the font-weight: 700; in the developer menus for both Safari and Edge: Screen Shot 2019-05-15 at 10 11 31 AM

And here is Safari left and Edge right with that synthetic bold OFF: Screen Shot 2019-05-15 at 10 10 03 AM

Synthetic font versions will have a great deal of variance between browsers because the font is being "regenerated" as a faux replacement. For consistent appearance it's best to avoid synthetic versions. Now with humanist sans serif fonts like Helvetica, sometimes it's okay to all the synthetic version for ITALICS, unless you use italics heavily. But "synthetic" features should never be used for font-weight if ti can be avoided, partly for reasons that cause this very problem.

Really, all a browser can do is "make the font uniformly thicker" — but a font designer does a lot more when creating different weights of a font. I recommend adding an import declaration for the Proxima Neue BOLD weight.

BELOW: With the faux bold turned ON, note that Safari uses additional letter spacing fo its faux bold:

Screen Shot 2019-05-15 at 10 11 59 AM

Hope this helps.

(Also, just a nit pick but best practices indicates never having spaces in URLs, it is advisable to use underscores or CamelCaseStyle for all file and folder names that may become part of a URL, such as those for the location of this font file. It is considered UNSAFE:

Per RFC 1738: Unsafe: Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. )

alastc commented 5 years ago

Thanks, that'll save me digging :-)

It does support my point though: There is variety in rendering and just using the CSS as the basis for testing is not enough.

Myndex commented 5 years ago

Thanks, that'll save me digging :-)

No prob, I was curious as to the cause.

It does support my point though: There is variety in rendering and just using the CSS as the basis for testing is not enough.

Though this was due to a font-weight not being defined, which kinda supports my point ;-)

If a font is imported and fully defined, then that font can be used for validation. If the font is "local" or undefined such that it's "synthetic" then it cannot be validated because it is "non-controlled."

awkawk commented 5 years ago

Will get further work but closing.

alastc commented 5 years ago

Just noting Bruce's comment from the survey:

What about just having a requirement that (at AA) end users have the option to avoid fonts that are ornamental, decorative, unusually thin, or otherwise likely to be difficult to read? At AAA, the site owner provide end users the choice of at least two font face, and one is serif and one sans serif?

patrickhlauke commented 5 years ago

would this require each site to now start providing custom settings and style switchers? sounds overly restrictive for something that a user would most likely be better served with at UA level with global font preferences/overrides, instead of mandating it for authors to implement. (and would also then require some normative measurable objective definition of what is and isn't "unusually thin or difficult to read")

jake-abma commented 5 years ago

For AA, who will be the judge of that? How to set the standard for fonts? Also from a design and technical perspective I don't see this happening easily.

For AAA, this doesn't solve the contrast / weight issue at all, still you can have a bad serif AND sans serif. And again, who will be the judge?


The greatest issue to solve here is we just don't have an industry standard for fonts and weights and lots of differences / custom adjustments. As long as this is not solved it will be hard (impossible?) to make a clear / testable SC.

Myndex commented 5 years ago

For AA, who will be the judge of that? The greatest issue to solve here is we just don't have an industry standard for fonts and weights and lots of differences / custom adjustments. As long as this is not solved it will be hard (impossible?) to make a clear / testable SC.

The "greatest" issue is that the problems here with fonts (and also contrast and other related visual issue) is that there is much that is very subjective. In additional, there is a big basket of design elements that are interrelated and not independent.

For instance: contrast threshold is an objective "forced answer" of if something is visible or not. And while contrast threshold is an important metric for clinical diagnoses, it is not really relevant for instructing us in terms of setting contrast for a web page. Normal vision is 1%, (RS 2) and a profound impairment is 10% (RS1) but that does ot mean that a 10% needs ten times the contrast of a web page to have a similar reading speed.

As such, reading speed, not contrast threshold, tell us where things need to be set for accessibility. But reading speed is difficult to assess and is affected by cognitive, adaptation, acuity, aberration, glare, scatter, size, spacing, stroke density, antialiasing, edge contrast, body contrast, container contrast, modulation.........

For instance, the research shows that objectively serif and sans serif on a computer monitor does not make a difference in reading speed, but on the on the other hand using serif fonts causes the letter spacing to increase which does help reading speed.

When it comes to color and luminance, we are only able to predict certain aspects of color and contrast because we have a functional model: the CIE 1931 XYZ (and later related LAB/LUV) models of human vision, all based on a "standard observer", which is the average visual response of a group of 17 young British males that were part of the circa 1931 experiments.

There is no "standard observer" model for reading speed that can be applied to anything, though Legge et.al. has come closest with his "Mr. Chips" programatic model.

And contrast sensitivity is not particularly instructive. Contrast letter sensitivity does not tell us if a particular ornamental font is legible or not, there is no relationship there.

As for legibility (defined as reading speed) of an "ornamental font", I am not aware of any such specific research — but if other research (such as serifs) is a guide, ornamental fonts are probably not an accessible issue, by which I mean a normal vision person would probably have as much of a decrease in reading speed as a low vision person for a "difficult" ornamental font.

For instance, blackletter/OldeEnglish fonts are notoriously hard to read for everybody.!

patrickhlauke commented 5 years ago

so, with all that said...seems to me this points to the fact that we can't make the flawed model of contrast ratio here any better by adding more flawed/subjective factors on top of it?

WayneEDick commented 5 years ago

There is a lot of research that indicates that font weight does not improve readability (perception). It seem like we shouldn't adjust contrast ratios for something that is not justified by research.

We use color contrast because there is a long body of research that supports the hypothesis that increased contrast improves readability for a large number of users (normal and low vision). The problem with font weight is that the starting font is not designed to be thick in general. There are fonts that are designed to be bold or demi-bold. These are readable (not clinical testing) but from review of typographers. Font weight doesn't work that way.

Bottom line: I see no reason to adjust contrast ratio for higher weight font variants.

Best, Wayne

On Tue, May 14, 2019 at 10:35 PM Andrew Somers notifications@github.com wrote:

this goes way beyond the topic at hand...

No it doesn't.

The concept of a "prefer accessible fonts" flag, and a specified accessible font in the font fallback list is DIRECTLY RELATED to the topic at hand, actually, which is a discussion of font weight and contrast. A user-selectible flag that forces a fall-back to an accessible font directly relates to font weight and contrast, as in shifting the priority for testing accessibility to a base, set of accessible fonts.

It would make YOUR job easier, FWIW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/665?email_source=notifications&email_token=AB6Q4FYU6YHZ24JAGA4LBF3PVNZIPA5CNFSM4G7RRTLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVNKTAA#issuecomment-492480896, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6Q4F4IIDY3VQCWK4S4EHTPVNZIPANCNFSM4G7RRTLA .

patrickhlauke commented 5 years ago

There is a lot of research that indicates that font weight does not improve readability (perception).

What about the converse? Fonts that are exceedingly thin/ultra-thin? Empirically, with all other things (color, x-height, etc) being the same, thin variants are much harder to read and visually appear lighter than their regular/roman counterparts (not to mention that due to antialiasing, they tend to render much lighter on light background / darker on dark background).

The problem with font weight is that the starting font is not designed to be thick in general. There are fonts that are designed to be bold or demi-bold.

Worth also noting that it's not the case anymore that fonts just get artificially "fattened" / made bold by browsers. Indeed, with variable fonts, and web fonts that draw on separate actual font files for different weights, setting text to a particular font-weight actually uses the correctly designed cut/variant when available.

Myndex commented 5 years ago

Font weight DOES affect reading speed, and there is research that describes this. Both too thin and too thick can have a negative impact.

However, at the moment there is no standard metric that describes this in a way that can be used to compare two fonts.

These are issues with weight:

1) Under a certain thickness, the anti-aliasing of the rasterizer will blend the font color into the background color causing a massive change in contrast.

2) Under a certain thickness, individual visual acuity problems (focus) will cause the same perceptual effect.

3) Over a certain thickness contrast modulation is reduced (see Michelson Contrast).

4) Over a certain thickness the internal glyph structure becomes "closed" such as the hole in the lower case "e" reducing legibility and reading speed.

5) ALL of the above are a FUNCTION OF SIZE.

6) SOME of the above are interrelated to letter spacing, especially 3.

7) NONE of the above are a function of color contrast but SOME of the above directly affect perceived color contrast (especially Num.1) and there is a relationship to contrast modulation.


SIZE is the single biggest determinant of maximum reading speed. Best size is inside a small range wherein fonts that are either too small OR TOO BIG cause a maximum reading speed reduction.

Readability vs Font stroke thickness is a function of Michelson Contrast (aka contrast modulation).

Contrast modulation is also affected by Acuity (minimum circle of confusion or point spread function).

And I'm not even mentioning adaptation, B&B Surround, etc etc.

The point is (and discussed in this weeks teleconference) is that the work relating to font weight is essentially subsumed into the research we are doing for issue 695.

This new and in-depth research is going to eventually generate a comprehensive new set of standards that (hopefully) cover all of these interdependent issues.

I.e. this issue is listed as closed, but far from forgotten. It will be included in a future all-encompasing standard.