[BUG] Trace viewer sometimes returns blank zip files when running "retain-on-failure" and stops recording on second test failure

cullman1 commented 2 years ago

Context:

Playwright Version: 1.17.1
Operating System: Windows 10 Pro - 21H1 - 19043.1387
Node.js version: 16.9
Browser: [All]
Extra: Running 312 tests, in parallel, max 4 at a time

Code Snippet

import { PlaywrightTestConfig } from "@playwright/test";

const config: PlaywrightTestConfig = {
  projects: [
    {
      name: "TestsInChromium",
      use: {
        browserName: "chromium",
        headless: true,
      },
      timeout: 35900,
    },
    {
      name: "TestsInFirefox",
      use: {
        browserName: "firefox",
        headless: true,
      },
      timeout: 39500,
    },
  ],
  use: {
    baseURL: "http://localhost:8000",
    trace: "retain-on-failure",
  },
  globalSetup: require.resolve("./playwright.globalSetup.ts"),
};

export default config;

Describe the bug

Runningtrace-on-failure setting generates a whole set of trace{x}.zip files for a single test, most of which show up as empty with "No actions recorded" when I open them - even though they have file sizes. One or two do work, but it's random as to which one it is (middle or last). Also after the first test has failed, when the second test fails, it doesn't create a folder or any zip files, instead I get:

```browserContext.newPage: Protocol error (Browser.newPage): Browser closed.
==================== Browser output: ====================
<launching> C:\Users\chris.ullman\AppData\Local\ms-playwright\firefox-1304\firefox\firefox.exe -no-remote -headless -profile C:\Users\CHRIS~1.ULL\AppData\Local\Temp\playwright_firefoxdev_profile-6OhGb8 -juggler-pipe -silent
<launched> pid=96
[pid=96][err] *** You are running in headless mode.
[pid=96][out]
[pid=96][out] Juggler listening to the pipe
[pid=96][out] console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\chris.ullman\\AppData\\Local\\Temp\\playwright_firefoxdev_profile-6OhGb8\\search.json.mozlz4", (void 0)))
[pid=96][out] Crash Annotation GraphicsCriticalError: |[0][GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt (t=17.6852) [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: , line 0: TypeError: NetworkError when attempting to fetch resource.
[pid=96][err]
[pid=96][err] ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[pid=96][err]
[pid=96][err] Exiting due to channel error.```

Happy to supply more info as needed.

pavelfeldman commented 2 years ago

Could you give us an idea on what your tests look like?

cullman1 commented 2 years ago

Here's on particularly convoluted test that throws sporadic errors when running in parallel (but exactly the kind of functionality we need to determine if we have a bug or not)


test("Provisions - Pupils tab - Add pupil goals date", async ({
  adminUserPage: page,
  testSchoolId,
  db,
}) => {
  const customPupil1 = (p: Pupil): Pupil => {
    const upn = "W000100100033";
    return {
      ...p,
      id: `Schools/${testSchoolId}/Pupils/${upn}`,
      Upn: upn,
      FirstName: "Truman",
      LastName: "Trudeau",
      DateOfBirth: "2016-08-25T00:00:00.0000000",
      Gender: "Male",
      StartDate: "2020-09-01T00:00:00.0000000",
      EndDate: null,
      Ethnicity: "WBRI",
      IsEal: false,
      EyfsProfileCharacteristics: {
        PlayingAndExploring: null,
        ActiveLearning: null,
        CreatingAndThinkingCritically: null,
      },
      YearGroups: {
        "2017": -3,
        "2018": -2,
        "2019": -1,
        "2020": 0,
        "2021": 1,
        "2022": 2,
        "2023": 3,
        "2024": 4,
        "2025": 5,
        "2026": 6,
      },
      BirthTermName: "Summer",
    };
  };
  const pupil = buildPupilDocument(testSchoolId, customPupil1);
  await db.putDocuments(pupil);
  await Promise.all([
    page.waitForResponse(/\/provision-types/).then(requireStatus(200)),
    page.goto(`/schools/${testSchoolId}/provision-mapping`),
  ]);
  await Promise.all([
    page.waitForResponse(/\/pupil-goal-titles/).then(requireStatus(200)),
    page.click('text="Add pupil goal"'),
  ]);
  await Promise.all([
    page.waitForResponse(/\/pupil-goal-titles/).then(requireStatus(200)),
    page.fill('input[name="title"]', "Reading"),
  ]);
  await page.fill("data-test-id=add-goal-date", todayDate);

  const testElement = await page.waitForSelector("data-test-id=add-goal-date");
  expect(await testElement.inputValue()).toBe(todayDate);
  await Promise.all([
    page.waitForResponse(/\/batch-pupil-search/).then(requireStatus(200)),
    page.fill('[placeholder="Search by name, or paste a list"]', "tru"),
  ]);
  await page.click("text=Trudeau, Truman");
  await Promise.all([
    page
      .waitForResponse(/\/commands/)
      .then(requireStatus(200))
      .then(() =>
        page
          .waitForResponse(/\/api\/.+\/provision-mapping/)
          .then(requireStatus(200))
      ),
    page.click('button:has-text("Save")'),
  ]);

  await page.goto(`schools/${testSchoolId}/provision-mapping/goals/report`);
  await page.click("text=Reading");
  const test2 = await page.waitForSelector(`text=due today`);
  expect(await test2.innerText()).toBe(`Review due today
Review`);
});```

cullman1 commented 2 years ago

A simple one looks like this:


test("Dates on Calendar", async ({ adminUserPage: page, testSchoolId }) => {
  await page.goto(
    `/schools/${testSchoolId}/admin/terms?academicYear=${yearStart}`
  );
  await page.waitForSelector("input[type=date]");
  const termDates = await page.$$("[type=date]");
  expect(await termDates[0].getAttribute("value")).toContain(
    `${yearStart}-09-01`
  );
  expect(await termDates[1].getAttribute("value")).toContain(
    `${yearStart}-10-31`
  );
});

cullman1 commented 2 years ago

We have a database fixture that puts and deletes documents on a RavenDb for the tests

 db: async ({}, use) => {
    const db: DatabaseFixture = {
      async putDocuments<T extends { id: string }>(...docs: Array<T>) {
        const session = documentStore.openSession();
        for (const document of docs) {
          await session.store(document, document.id);
        }
        session.advanced.waitForIndexesAfterSaveChanges();
        await session.saveChanges();
        session.dispose();
      }, ....

cullman1 commented 2 years ago

And a fixture that creates a page, and logins in the user automatically for each test:

function buildSchoolPageFixture(
  userAccessLevel: "Normal" | "Admin" | "Governor"
) {
  return async (
    {
      testSchoolId,
      db,
      browser,
    }: {
      testSchoolId: string;
      db: DatabaseFixture;
      browser: Browser;
    },
    use: (page: Page) => Promise<void>,
    testInfo: TestInfo
  ) => {
    // Create user in DB.
    const user = buildUserDocument(testSchoolId, userAccessLevel);
    await db.putDocuments(user);
    try {
      // Create page where user is already logged in.
      const authCookie = await getAuthCookie(user.EmailAddress);
      const context = await browser.newContext({
        storageState: authStorageState(authCookie),
      });
      const page = await context.newPage();

      page.on("pageerror", (e) => {
        const message = `Script error from page.
  Test File: ${testInfo.file} 
  Test: ${testInfo.title}
  Page URL: ${page.url()}
  Error Message: ${e.message}
  Stack: ${e.stack}`;
        throw new Error(message);
      });
      page.on("crash", () => console.log("Crash"));

      await use(page);

      // Close the page before we clean up DB.
      // Otherwise pending page requests will throw errors on server.
      page.close({ runBeforeUnload: false });
    } finally {
      await db.deleteDocument(user.id);
    }
  };
}

cullman1 commented 2 years ago

So the page is generating script errors occasionally from page.on - and the idea is to get the trace viewer to then view exactly what's going on. The tests always pass sequentially or in isolation, it's only when there's 100s of them running in parallel that we get the script errors.

cullman1 commented 2 years ago

I think there's two distinct issues here actually after more experimenting:

1) Blank zip files

2) The [pid=96][out] console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\Users\chris.ullman\AppData\Local\Temp\playwright_firefoxdev_profile-6OhGb8\search.json.mozlz4", (void 0))) [pid=96][out] Crash Annotation GraphicsCriticalError: |[0][GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt (t=17.6852) [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt

error which only occurs, when the trace is turned on. It seems to occur when there has been a script error and it's trying to launch Firefox, but failing.

mxschmitt commented 2 years ago

@cullman1 sorry for the late reply. Your test requires a lot of boilerplate - would it be possible that you could create a smaller repro maybe also in form of a GitHub repository? Otherwise its hard for us to investigate without being able to run the actual code.

cullman1 commented 2 years ago

Ok sure - I'm on it.

cullman1 commented 2 years ago

No clear reproducible recreation as yet - but I note the line that causes the error is this one in fixtures.ts:

const context = await browser.newContext({
        storageState: authStorageState(authCookie),
      });

so the failing tests aren't even being run, when they fail with that error, the browser is failing to create a new context. The authStorageSate looks like this:

function authStorageState(authCookie: string): StorageState {
  return {
    cookies: [
      {
        sameSite: "Lax",
        name: ".AspNet.ApplicationCookie",
        value: authCookie,
        domain: "localhost",
        path: "/",
        expires: -1,
        httpOnly: true,
        secure: false,
      },
    ],
    origins: [],
  };
}

This error only occurs when multiple tests are run in parallel, and only occurs with Firefox, never with Chromium. I'll look at isolating this particular aspect, as I think the tests themselves aren't the issue (as the failures happen in random tests, and then sometimes will all work).

cullman1 commented 2 years ago

I notice the same error - specifically RenderCompositorSWGL failed mapping default framebuffer

was logged with regards to Firefox 10 months ago and Puppeteer, and is seemingly unfixed:

https://bugzilla.mozilla.org/show_bug.cgi?id=1693011

cullman1 commented 2 years ago

Ok on the two problems.

1) The empty trace files, it seems that on the event of failures, it generates multiple trace files of the format trace-1.zip, trace-2.zip etc and only the last one is openable in the playwright viewer. So quite commonly I get 30 trace files for one test and only the last will open. Is that expected behaviour? I can recreate this predictably if it isn't.

2) The second one the intermittent: "RenderCompositorSWGL failed mapping default framebuffer" - this seems to be a problem with Firefox crashing when it first opens up in headless mode. I created 12 trivial tests of the following format and ran them concurrently with 4 workers and one would randomly fail once every 3 or 4 tests:

test("Test 1", async ({
  sysAdminUserPage: page,  
}) => {
  await page.goto("/sysadmin/maintenance");
  await page.waitForSelector("text=Schedule");
      });

and I'd get:

browserContext.newPage: Protocol error (Browser.newPage): Browser closed.
==================== Browser output: ====================
<launching> C:\Users\chris.ullman\AppData\Local\ms-playwright\firefox-1304\firefox\firefox.exe -no-remote -headless -profile C:\Users\CHRIS~1.ULL\AppData\Local\Temp\playwright_firefoxdev_profile-6OhGb8 -juggler-pipe -silent
<launched> pid=96
[pid=96][err] *** You are running in headless mode.
[pid=96][out]
[pid=96][out] Juggler listening to the pipe
[pid=96][out] console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\chris.ullman\\AppData\\Local\\Temp\\playwright_firefoxdev_profile-6OhGb8\\search.json.mozlz4", (void 0)))
[pid=96][out] Crash Annotation GraphicsCriticalError: |[0][GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt (t=17.6852) [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: resource://gre/modules/XULStore.jsm, line 66: Error: Can't find profile directory.
[pid=96][err] JavaScript error: , line 0: TypeError: NetworkError when attempting to fetch resource.

     at Object.fn (C:\Users\chris.ullman\source\repos\insight\e2e-tests\fixtures.ts:148:37)

The same error has been observed with Puppeteer and Selenium, so I'm guessing there's not much you can do. My only query is there some settings in firefox than can be tweaked in firefoxUserPrefs or the launch options that might sidestep this? There was some querying about the WEBRENDER option being set to 0 in the past to avoid this (despite it being in headless mode), but Firefox have disabled this option and forced Web rendering on since 92.

So unless you need any more from me, feel free to close.

yury-s commented 2 years ago

The empty trace files, it seems that on the event of failures, it generates multiple trace files of the format trace-1.zip, trace-2.zip etc and only the last one is openable in the playwright viewer. So quite commonly I get 30 trace files for one test and only the last will open. Is that expected behaviour? I can recreate this predictably if it isn't.

This is odd, does it happen when you have a single failing test? Can you upload one of those blank traces (assuming the file sizes are not 0) ?

The second one the intermittent: "RenderCompositorSWGL failed mapping default framebuffer" - this seems to be a problem with Firefox crashing when it first opens up in headless mode. I created 12 trivial tests of the following format and ran them concurrently with 4 workers and one would randomly fail once every 3 or 4 tests:

This appears to be the bug in Firefox that you mentioned above and from the last comment in the tread it seems that there is no way to disable the failing component.

cullman1 commented 2 years ago

trace-1.zip

Here's one from a run where 6 files were created, all 6 had non-zero file sizes, only the last one is openable in trace viewer. (I've kept all of them in case needed).

yury-s commented 2 years ago

Can share other traces too? Looks like a bunch of tests was interrupted early and produced those traces.

cullman1 commented 2 years ago

trace-2.zip trace-3.zip trace-4.zip trace-5.zip trace.zip Here's the entirety of the tests including the working one trace-5.zip. It's not uncommon to have up to 30 traces in one folder.

nick commented 2 years ago

I found a similar issue. The problem ended up being in my test.beforeEach function... a precondition was failing that caused the new browser context to never even be created. This resulted in the empty trace.zip files you describe.

pavelfeldman commented 2 years ago

I hope nick's advice above helps. As long as we can't reproduce this issue, we can't act on it. If this still happens to you, could you open a new issue, link this one and include a link to a simple repo / zip that we could use to reproduce locally?

microsoft / playwright

[BUG] Trace viewer sometimes returns blank zip files when running "retain-on-failure" and stops recording on second test failure #10701