[Feature] New test status for screenshot tests (something like passedWithDiff)

bezyakina commented 2 years ago

Hi! We have screenshot tests that may have a small pixel diff or they may fall before reaching the screenshot. We would like to be able to repeat the test if it failed not because of the difference in pixels. And also we don't want to retry the test if the error is caused due to pixel difference.

I see 2 options here:

Add a new status for screenshot tests (something like passedWithDiff)
Add the ability to enable/disable the retries of tests directly from the test, suggested here https://github.com/microsoft/playwright/issues/11584

Thank you for your attention)

aslushnikov commented 2 years ago

@bezyakina so you have a test like this:

test('should work', async ({ page }) => {
  await page.goto('https://example.com');
  await expect(page.locator('h1')).toContainText('title'); // 1
  await expect(page).toHaveScreenshot(); // 2
});

This test sometimes fails on //1, and sometimes fails on //2.

When the test fails on //1, you'd like to re-try it
When the test fails on //2, you'd like to avoid retries.

A few questions:

why does screenshot test fails for you with a small diff? Can you make it pass, using something like maxDiffPixels or mask options?
if you can't make screenshot test pass reliably, e.g. because it runs on a different platform – can you detect this and try-catch the screenshot assertion in your test?

bezyakina commented 2 years ago

@aslushnikov The problem is that the tests are run in a webview on android and use androidWebView.page(), also on our mobile farm there are various devices on which we drive these tests. Sometimes during the test (rarely, but it happens) it may fall with an error before checking the screenshot:

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20)

We don't want to set maxDiffPixels separately for each test, because we may miss a bug if the pixel difference is ignored
We can intercept the error, but we can't retry the test if 0 retries are specified in the configuration. If we specify 1 retry in the configuration and test fails because of difference in pixels it will be repeated (we can't cancel retry diretly from the test). That's the problem. If we intercept the diff error and change the test status from Failed to Passed, then in the HTML report it will be difficult for us to find tests with pixel diff.

Test exapmle:

for (const testCase of testCases) {
  test.describe('The layout is equal to snapshot', () => {
    let device: AndroidDevice;
    let app: AndroidApp;
    let page: SearchWebAppPage;

    test.beforeEach(async () => {
      [device] = await android.devices();
      app = new AndroidApp(device);
      page = new SearchWebAppPage(device);

      await page.prepareBeforeLayoutDiffTest(testCase);
    });

    test.afterEach(async () => {
      await app.close();
    });

    test(`${testCase.testName}`, async ({}, testInfo) => {

      const actualSnapshot = await page.getFullPageSnapshot(
        testCase,
        `${testCase.testSlug}-test.png`,
        testInfo
      );

      expect.soft(actualSnapshot, 'The layout of the tested build is equal to the layout of the prod build').toMatchSnapshot(expectedSnapshotPath);
    });
  });
}

aslushnikov commented 2 years ago

@bezyakina why don't you want to retry tests in case of screenshot differences? Is that to save some time?

bezyakina commented 2 years ago

@aslushnikov Because we take long screenshots, out of the box full-screen screenshots don't work correctly with mobile webviews, so we have to scroll, then take a screenshot, repeat and at the end combine all the screenshots into one big screenshot. This operation can take up to a minute. Since the tests are stable, if the layout is broken, then there is no point in repeating the test, because we will get the same result.

aslushnikov commented 2 years ago

Since the tests are stable, if the layout is broken, then there is no point in repeating the test, because we will get the same result.

@bezyakina I see. So far test retries cannot be configured dynamically, and there are no immediate plans to do so.

The best workaround I can think of is to split out the screenshot tests into a separate project and configure it with no-retries policy. Would it work?

bezyakina commented 2 years ago

@aslushnikov We did it. There are only screenshot tests in our suite. But when the tests fall due to unstable devices connection, we will have to restart the entire test suite. The point was not to restart the entire suite. Instead, retry the only failed test directly.

pavelfeldman commented 2 years ago

I would probably try/catch the screenshot comparison. It will still produce the artifacts, but will not fail the execution. We can consider adding expect.trial to mitigate your use case more broadly.

dgozman commented 2 years ago

@bezyakina Did you try the try/catch workaround? If so, how was your experience? I am evaluating whether we should to expect.trial or not, and if we do whether that would actually help you.

bezyakina commented 2 years ago

@dgozman Yes, I tried try/catch, if the error message contained the word like "diff", I changed the test status to passed. But in this way, it is difficult to separate the passed tests from those passed with a small diff in an html report.

mscottford commented 1 year ago

I created #23197 which might be a duplicate of this issue.

joergRossdeutscher commented 11 months ago

Totally agree to this feature. We're currently using Ghostinspector and miss that feature a lot already.

A failed screenshot is rarely something with a huge severity. Some orderNr, date/time, advertising, rendering differences, animated objects,… has changed. However, just accept 95% matching misses a lot important stuff. Treshold is IMHO not useful at all and should be always 100%.

We are using this method:

a failed assertion is a failed test, a real error
a failed screenshot is just a warning
We do not allow a single pixel difference

Ghostinspector returns two results. Passed and ScreenshotPassed. First on is an error, last one is a warning.

In daily business we have a lot of findings in the screenshots, but like 95% of „looks different“ is a false positive.

microsoft / playwright

[Feature] New test status for screenshot tests (something like passedWithDiff) #14485