microsoft / fast

The adaptive interface system for modern web experiences.
https://www.fast.design
Other
9.28k stars 595 forks source link

fix: playwright tests in fast-foundation run in serial mode #6855

Closed radium-v closed 5 months ago

radium-v commented 1 year ago

🐛 Bug Report

All component tests in fast-foundation are written in a way that alleviates run time for the CI environment, but this causes flaky scenarios since each component's tests are run on the same page. This can lead to stale state conflicts. Additionally, test order is purely sequential, meaning re-running a test requires running all tests which appear earlier in the module.

💻 Repro or Code Sample

Each test module is set up in a similar way, which adds cognitive strain when trying to figure out why we're doing this. Since I'm the one who did this, here's a breakdown:

test.describe("Accordion", () => {
    /* The `page`, `element`, and `root` objects are shared between all tests in
       this block. Synchronous tests don't get their own page object, so they
       have to share. The hoisted Locators are more of a convenience, but they
       break the paradigm for test isolation best practices.
    */
    let page: Page;
    let element: Locator;
    let root: Locator;

    /* The page is only loaded once for all tests. It's the test's
       responsibility to replace the entire fixture body.
    */
    test.beforeAll(async ({ browser }) => {
        page = await browser.newPage();
        element = page.locator("fast-accordion");
        root = page.locator("#root");

        /* There's no reason go to the storybook fixture here, since each test
           replaces the body anyway. We just need a page with the bundle loaded,
           which is currently borrowed from Storybook.
        */
        await page.goto(fixtureURL("accordion--accordion"));
    });

    /* The page has to be held open until all tests complete. I don't know if
       Playwright stops the process after retrying a failed test. This could be
       an issue for memory management.
    */
    test.afterAll(async () => {
        await page.close();
    });

Here's a test fixture example:

    /* Normally the function would provide `page`, but doing this breaks the
       serial execution of tests. All tests need to be parallel or serial.
    */
    test("should set an expand mode of `single` when passed to the `expand-mode` attribute", async () => {
        /* Each test has to be written depending on the previous test's state,
           or rather, the state of the fixture. This is because the fixture is
           not reloaded between tests. In most cases, this means replacing
           the root element's contents, but some tests might involve event
           handlers or may affect the body or window (YMMV)
        */
        await root.evaluate(node => {
            node.innerHTML = /* html */ `
                <fast-accordion expand-mode="single">
                    <fast-accordion-item>
                        <span slot="heading">Heading 1</span>
                        <div>Content 1</div>
                    </fast-accordion-item>
                    <fast-accordion-item>
                        <span slot="heading">Heading 2</span>
                        <div>Content 2</div>
                    </fast-accordion-item>
                </fast-accordion>
            `;
        });

        /* `element` and `root`, as well as other locators, are hoisted to the top
           of the `test.describe` scope. If a test changes the locator, it affects
           every proceeding test (unless it's changed back).
        */
        await expect(element).toHaveAttribute("expand-mode", AccordionExpandMode.single);
    });

🤔 Expected Behavior

Ideally, tests would be written in the default Playwright way, where each one loads a new page, sets up its own locators and evaluations, then gets cleaned up automatically by Playwright. This would allow for test sharding and parallel workers, and would make it easier for test authors to isolate each test scenario.

Here's the same setup as above, but written in the preferred style:

test.describe("Accordion", () => {
    test("should set an expand mode of `single` when passed to the `expand-mode` attribute", async ({ page }) => {
        await page.goto(fixtureURL("debug--blank"));

        const root = page.locator("#root");

        await root.evaluate(node => {
            node.innerHTML = /* html */ `
                <fast-accordion expand-mode="single">
                    <fast-accordion-item>
                        <span slot="heading">Heading 1</span>
                        <div>Content 1</div>
                    </fast-accordion-item>
                    <fast-accordion-item>
                        <span slot="heading">Heading 2</span>
                        <div>Content 2</div>
                    </fast-accordion-item>
                </fast-accordion>
            `;
        });

        const element = page.locator("fast-accordion");

        await expect(element).toHaveAttribute("expand-mode", AccordionExpandMode.single);
    });

😯 Current Behavior

There are two reasons why I went with serialized tests.

The first was an effort to utilize Storybook beyond local development. My hopes were (and still are) that we would serve the Storybook build for each PR, and then run all of the PR's component tests against that deployment. This would also give us the opportunity to use each instance as a reference when doing code reviews. In the past, Storybook's page-loading mechanisms caused intermittent issues when it came to page state, and unfortunately my preliminary investigations for #6698 indicate that this behavior may be even more pronounced when using Vite.

The second was that performance on our CI machines is just really bad for these sorts of problems. fast-foundation currently has almost 3000 tests (accounting for all three configured browser engines). Running all of these tests locally takes about a minute, but I have a 32-thread workstation that's optimized for high performance compute. In comparison, the Github Actions machines have... 2 threads. Technically, I think the MacOS machines have 3. Either way, Playwright only utilizes 50% of the available threads to run tests in parallel mode, but that only really works if there are four or more threads available. Half of two is one, and you can't really parallelize 3000 tests on one thread - not to mention that we run all tests from all packages in streaming mode with Lerna. So, to run tests in parallel, we'd be using one of only two CPU threads, while Lerna is simultaneously running all of the other package tests in streaming mode across both cores.

💁 Possible Solution

A solution for the first problem would be to separate the Storybook part from the tests part. We'd need to create a test harness that can be hosted either locally or deployed as a CI step. Since mostly every test already has its own markup, we just need a page that loads the components bundle. In the past, this was handled by the fast-components build, but now a special build would be needed specifically for the test components.

As for the second problem, slow CI performance shouldn't be an excuse to write flaky test code. The tests should be rewritten in the proper parallel fashion, and techniques like sharding and selective build steps should be implemented to avoid running extra cycles. There may be ways for the core maintainers to get access to runners with more cores/threads as well.

janechu commented 5 months ago

Appears to be resolved by #6929