Contrast Tests (and others?)

Proposal

Many developers use contrast tests either through annual audits, or E2E tests. And there are tools in Figma that alert of any contrast issues. But what if you could automatically run contrast checks just based on what’s in tokens.json? What if you could, say, specify a foreground color, background color, and typography style, and have Cobalt test you’re meeting the minimum contrast ratios? Cobalt could even take modes into account as well!

The main idea is whenever you run co build it would fail with an error if tests didn’t pass (or could be a separate co test command; not sure). You could make as many token groups as you’d like, and assert whatever levels of compliance you need.

The simple contrast checker that most are familiar with is the simple WCAG 2 model which lets you naïvely compare any 2 colors and get a contrast ratio. But there are bigger, scarier contrast ratios like APCA which not only give different scores based on which color is foreground and which is background, but also your font family, size, and weight used with those colors. Such a complex algorithm would be great to implement close to your tokens, especially when running in CI!

Possible implementations

1. Config

The simplest implementation, e.g.:

export default {
  tests: [
    {
      type: 'contrast',
      tokens: {
        foreground: '{color.semantic.text}',
        background: '{color.semantic.bg}',
        typography: '{typography.base}',
        modes: ['light', 'dark'],
      },
      expect: {
        apca: 'Lc 90',
        wcag2: ‘AAA',
      },
      severity: 'error', // 'error' | 'warn' | null,
    },
  ],
}

Pros: easy to test the idea, easy for users to quickly write tests

Cons: very limiting, as it won’t allow for “custom” user tests; either a test is built-in or not supported (so realistically there’d only be a couple test types)

2. Lint plugins

Plugins get a new lint stage, which means they could accept any possible arguments for that stage (here, checks, but this could be named anything, and plugins could even reuse config you’ve already passed in).

import contrast from '@cobalt-ui/plugin-contrast';

export default {
  plugins: [
    contrast({
      checks: [
        {
          tokens: {
            foreground: '{color.semantic.text}',
            background: '{color.semantic.bg}',
            typography: '{typography.base}',
            modes: ['light', 'dark'],
          },
          expect: {
            apca: 'Lc 90',
            wcag2: ‘AAA',
          },
          severity: 'error', // 'error' | 'warn' | null,
        },
      ],
    }),
  ],
}

Sure, the syntax seems similar to #1, but under-the-hood has lots of differences:

Pros: users can write custom tests, plugins can have additional “lint” awareness, and the tests could use more plugin-aware code (e.g. maybe in addition to your core tokens.yaml lint rules, you have additional linters that kick in when outputting CSS vars, such as warning on P3-gamut colors?)

Cons: if plugins try and be both “lint” and “build” plugins it can be a lot of work on the maintainer

3. Testing

This ticket uses the word “testing” as a catchall, but in this option I mean testing-testing: like actually shipping a true testing package that executes inside a runner:

// contrast-test.tokentest.js

import { assert, test } from 'node:test';
import { passesContrast } from '@cobalt-ui/testing';

const colorPairs = [/* (foreground, background, typography tokens) */];

for (const colorPair of colorPairs) {
  test("color combo passes accessibility", () => {
    assert.deepStrictEqual(passesContrast('./tokens.json', colorPair), { apca: 'Lc 90', wcag2: ‘AAA' });
  });
}

Pros: power, unlimited power!!!

Cons: this is woefully over-engineered, and would not be fun for me to maintain. Unit tests, by their nature, test runtime code. Tokens are statically-analyzable and there is no need to pretend they have runtime. This is not a serious option; I just wanted to write down an idea.

Of the three, I’m much leaning toward #2 as the option with the most potential, and could even be shipped in v1 without waiting for the 2.0 plugin API. Plus, kicking the tires now may inform how this system carries over into 2.x.

Other tests?

Other than contrast, are there any other “tests” that would be helpful to run on your design tokens? Ideally ones that can be run programmatically and not require any heavy browser setup or anything.

We could have additional types like even token ID assertion, or that certain modes exist, as well.

terrazzoapp / terrazzo