feat: better support for visual regression testing

aslushnikov commented 3 years ago

Playwright Test has a built-in toMatchSnapshot() method to power Visual Regression Testing (VRT).

However, VRT is still challenging due to variances in the host environments. There's a bunch of measures we can do right away to drastically improve experience in @playwright/test

[ ] support for docker test fixture to run browsers inside docker image.
[ ] support for blur in matching snapshot to counteract antialiasing
[x] better UI for reviewing snapshot diffs

Interesting context:

migration from backstopjs to @playwright/test

thekp commented 1 year ago

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it's used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots: --font-render-hinting=none --disable-skia-runtime-opts --disable-font-subpixel-positioning --disable-lcd-text

@nikicat Where exactly did you add these flags in your playwright project?

nikicat commented 1 year ago

@thekp I pass them to playwright.chromium.launch(args=[...]) here (chromium_flags() function is overriden inside a test).

draganpazin commented 1 year ago

We test screenshots off add-in the web MS Office 365 Excel. In some cases, size of add-in is 1px bigger than original. It seems we cannot control it. MS Office decides for this and is not deterministc. Image diff is negligible, and we could ignore it, but since size of image do not match toMatchSnapshot fails. Currently we do not have good workaround for that problem.

I would vote for toMatchSnapshot be able to compare images of different size.

ayroblu commented 1 year ago

One of the things we noticed is that our focus is somewhat different between runs (imagine loading a page with a text input, sometimes the text input is focused, sometimes it isn't).

Wondering if there's anything we can do to improve reliability apart from just manually blurring and focusing

matthias-ccri commented 1 year ago

My discrepancy was resolved by passing the --disable-remote-fonts flag to chromium.

    projects: [
        {
            name: 'chromium',
            use: {
                ...devices['Desktop Chrome'],
                launchOptions: {
                    args: [
                        // Configure text rendering so there's no difference between headless and headed (when debugging).
                        '--font-render-hinting=none',
                        '--disable-skia-runtime-opts',
                        '--disable-system-font-check',
                        '--disable-font-subpixel-positioning',
                        '--disable-lcd-text',
                        '--disable-remote-fonts',
                    ],
                },
            },
        },
    ],

GuilleDF commented 1 year ago

Hi, posting a flaky screenshot due to font rendering:

The baseline creation and the test run were both done on the mcr.microsoft.com/playwright:v1.28.0-focal docker image, on mobile safari (device preset is iPad (gen 7)).

Expected

Actual

Diff

mscottford commented 1 year ago

I wonder if some of the font rendering discrepancies might be because of local fonts being used instead of web fonts. For example, I have the Rambla font installed locally on my Mac, but my site also pulls that font in via CSS. In that case, consistently running the tests in an environment that does not have those fonts installed locally might address the problem. This might mean replacing the "expected" image with one from an environment that doesn't have the font installed.

M. Scott Ford Co-Founder & Chief Code Whisperer (CTO) Corgibytes, LLC 804.596.2375 x701 pronouns: he/him @.*** https://corgibytes.com ( https://corgibytes.com/ )

Have you read our First Round Review ( http://firstround.com/review/forget-technical-debt-heres-how-to-build-technical-wealth/ ) article about paying off technical debt?

Love refactoring and TDD? Join us at LegacyCode.Rocks ( http://LegacyCode.Rocks ) for virtual meetups, podcasts, and more.

Sent via Superhuman ( @.*** )

On Mon, May 22, 2023 at 5:01 AM, Guillermo De Fermín < @.*** > wrote:

Hi, posting a flaky screenshot due to font rendering:

The baseline creation and the test run were both done on the mcr. microsoft. com/ playwright:v1. 28. 0-focal ( http://mcr.microsoft.com/playwright:v1.28.0-focal ) docker image

Expected ( https://user-images.githubusercontent.com/7784127/239858688-1c8b6ef5-e033-427e-abf0-a3ea02fa9746.png ) Actual ( https://user-images.githubusercontent.com/7784127/239858703-e7848cc1-986c-4ad8-98cf-d01df2c4ff78.png ) Diff ( https://user-images.githubusercontent.com/7784127/239858699-24bd8502-e069-4726-8a70-1dcc75be53a6.png )

— Reply to this email directly, view it on GitHub ( https://github.com/microsoft/playwright/issues/8161#issuecomment-1556830269 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAAFGXBJYEO3C4MRZJAIFXLXHMTN7ANCNFSM5CAZUKGQ ). You are receiving this because you are subscribed to this thread. Message ID: <microsoft/playwright/issues/8161/1556830269 @ github. com>

deviantintegral commented 1 year ago

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture: a-basic-page-with-embed-images-ID-2008-1-expected

And this one:

a-basic-page-with-embed-images-ID-2008-1-actual

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

a-basic-page-with-embed-images-ID-2008-1-diff

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

gselsidi commented 1 year ago

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture:

And this one:

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control.

At the end I just gave up at 100% pixel perfection and allowed a % of variance.

Even 95% accuracy is still a feat on its own, in reality you’ll probably get to 99.9% good enough.

but yeah would be cool to get 100% so we know if things degrade in the future we degrade from 100% as opposed to a starting point of 95%

also make sure you use docker if not always running against the actual same physical machine

deviantintegral commented 1 year ago

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control. At the end I just gave up at 100% pixel perfection and allowed a % of variance.

nods yeah, that's what I figured. I'm currently working around it by setting maxDiffPixels if the browserName is webkit. Hopefully we can maintain 100% pixel coverage in Chrome or Firefox.

also make sure you use docker if not always running against the actual same physical machine

Good reminder. We're doing that with https://github.com/deviantintegral/ddev-playwright and the above screenshots are from running tests in a loop until they fail, all in the same environment.

pastelsky commented 1 year ago

There's a separate (related) issue regarding adding support for docker at https://github.com/microsoft/playwright/issues/20954 so that visual regression tests can run in a consistent environment and environment-related differences are negated.

It would be helpful to receive upvotes there from folks here if that's something you need.

mfucci-medable commented 10 months ago

I am encountering the same issue with chromium (and webkit at an even higher frequency, too high so we disabled it).

Version: Playwright 1.38.1 (but the issue is reproducible as well in 1.39.0) Env: running in ubuntu:jammy on an Apple M1 Pro (but the issue happens in our Linux CI pipeline as well, running in docker makes it pixel perfect between local and CI) What happens: About 5% of the time, randomly one letter is incorrectly positionned, always the same letter. On other screenshots, it might 2 -3 letters, sometime in the middle of a word. More info:

No network call, the css is inlined before the HTML.

Using chromium arguments (no improvements before / after enabling those arguments):

      '--font-render-hinting=none',
      '--disable-skia-runtime-opts',
      '--disable-system-font-check',
      '--disable-font-subpixel-positioning',
      '--disable-lcd-text',
      '--disable-remote-fonts',

My guess: this issue never happens on other screenshots that we are taking using exactly the same configuration, so it has to do with something in the HTML / CSS (that I am probably not allowed to share here)...

Actual / expected / diff (triggering here on the pseudo-locale test but might happen as well on the en-US version):

actual expected diff

viktor-avd commented 7 months ago

From maintainers

Hey folks! if you have examples of PNG screenshots that are taken on the same browser and same OS yet are different due to anti-aliasing issues, could you please attach the "expected", "actual" and "diff" images here?

This information will help with our experiments with fighting browser rendering non-determinism.

Hi, this appeared with the latest version, nothing like this happened before with the same code and configuration.

Playwright version: 1.41.1 Docker image: mcr.microsoft.com/playwright:v1.41.1-jammy Chrome without args / the same with next args:

'--disable-skia-runtime-opts',
'--disable-system-font-check',
'--disable-remote-fonts',
'--font-render-hinting=none',
'--disable-font-subpixel-positioning',

Example 1:

expected

actual actual

diff diff

Example 2:

expected actual-rule

actual expected-rule

diff diff-rule

deviantintegral commented 7 months ago

An update from our experiences above: We found that increasing maxDiffPixels (or maxDiffPixelRatio) to a level that could avoid false failures also led to too many regressions slipping through visual comparisons. However, the threshold option as documented https://playwright.dev/docs/api/class-pageassertions#page-assertions-to-have-screenshot-2 worked for us. Once we increased that from the default 0.2 to 0.3, we've had no false failures or missed regressions.

microsoft / playwright

feat: better support for visual regression testing #8161

From maintainers