flaky CI tests (maybe term.reset related)

The CI sometimes produces test failures for no obvious reasons. Those failures are not reproducible locally and seem to be more likely, when the CI machine is under heavy load.

Example:

Thats from weblinks tests, where all neighboring tests take a moderate time of ~130ms, while one test runs into a timeout. Those tests actually poll for a certain change to happen on the DOM, but for some reason that change never happened.

Our playwright test modules run all in the same page on the same terminal instance, but separate tests by a

  test.beforeEach(async () => {
    await ctx.page.evaluate(`
      window.term.reset();
      window._some_addon?.dispose();
      window._some_addon = new SomeAddon();
      window.term.loadAddon(window._some_addon);
    `);
  });

to reset the terminal and the tested addon to initial state. This raised my suspicion, whether there might be something off with the reset handling here. This is further backed by the fact, that introducing a wait after term.reset solves the issue:

  test.beforeEach(async () => {
    await ctx.page.evaluate(`
      window.term.reset();
      window._linkaddon?.dispose();
    `);
    await timeout(10);
    await ctx.page.evaluate(`
      window._linkaddon = new WebLinksAddon();
      window.term.loadAddon(window._linkaddon);
    `);
  });

Some digging on term.reset reveals, that it is indeed not 100% synchronous. Repro:

run demo
generate some terminal output, e.g. run ls
open console.log
run term.reset(); while (true) {} --> terminal gets not cleared before running into the busy loop, thus there is some task in a queue involved

Is it a microtask?

do the same as above, but run term.reset(); Promise.resolve().then(() => {while (true) {}}) instead --> nope, terminal still not cleared

Is it a macrotask?

do the same as above, but run term.reset(); setTimeout(() => {while (true) {}},0) instead --> yepp, terminal gets cleared before entering the busy loop (Edit: thats still not quite correct, as I found out below, requestAnimtionFrame is the real culprit)

So yepp, we have here the infamous nextTick-issue, that many nodejs devs should know, but with a pending task on the browser's macrotask queue.

So why is this a problem during test execution? Because the way we use it, tests get chained on the microtask queue under the hood:

  await beforeEach();
  await test1();
  await beforeEach();
  await test2();
  ...

So term.reset() in beforeEach places its cleanup macrotask, but is not awaited on the microtask queue itself (promises are microtasks, setTimeout functions are macrotasks). So the microtask queue will happily progress without ever calling the output cleanup. Adding the timeout above helps here, since it introduces a wait condition as macrotask by relying on setTimeout:

export async function timeout(ms: number): Promise<void> {
  return new Promise<void>(r => setTimeout(r, ms));
}

Solution: Best solution would be to make term.reset fully synchronous, thus to cleanup output with sync code.

Second best solution is to fix all playwright tests with a timeout and dont advertise term.reset as fully synchronous anymore.

So the issue is in fact a bit more complicated. I added this test snippet to an addon test:

  test.describe.only('buggy', async () => {
    const logs: string[] = [];
    const MAX = 19;
    new Promise<string[]>(r => {
      for (let i = 0; i <= MAX; ++i) {
        test(''+i, async () => {
          const content: string = await ctx.page.evaluate(`document.querySelector('.xterm-rows > div').innerHTML`);
          logs.push(content.slice(0, 50));
          if (i === MAX) r(logs);
        });
      }
    }).then(logs => console.log(logs));
  });

and run the test with

$> yarn test-integration --workers=50% --suite=addon-image

If resetting is perfectly sync, i'd expect this result for every browser:

[
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>'
]

which means, the DOM repr of the terminal buffer contains only one cell for the cursor.

I actually get this from Chromium (and to a much lesser degree from Webkit, Firefox seems fine locally):

[
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '',
  '<span> </span>', '',
  '<span> </span>', '<span> </span>',
  '<span> </span>', '',
  '<span> </span>', '<span> </span>',
  '',               '<span> </span>',
  '<span> </span>', ''
]

So the reset is sometimes not finished on certain browsers. Turns out, that not setTimeout fixes it, but requestAnimationFrame as wait condition in beforeEach:

  test.beforeEach(async () => {
    await ctx.page.evaluate(`
    window.__f = async () => {
      window.term.reset();
      return new Promise(r => requestAnimationFrame(r));
    }
    window.__f();
    `);
  });

This now reliably fixes the reset handling between tests, even under high load (tested locally up to a load of 20).

xtermjs / xterm.js

flaky CI tests (maybe term.reset related) #5184