microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
65.68k stars 3.58k forks source link

[Question] Visual testing in docker on different CPU architecture #13873

Open nickofthyme opened 2 years ago

nickofthyme commented 2 years ago

Hey team 👋🏼

I am working to migrate my puppeteer visual regression testing to playwright.

My team has people working on Macs using with either arm64 (M1 SoC) or amd64 (Intel) CPU architecture.

I'd like a way to run and update playwright tests/screenshots locally from either architecture and have the local screenshots match the screenshots running from the CI (linux/amd64).

Currently we use the mcr.microsoft.com/playwright:vx.x.x-focal docker image to run tests both locally and in the CI. However running on these different architectures produce screenshots that are ever so slightly different when run on a different architecture, virtually imperceptible differences.

Screenshot from M1 mac - arm64

image

Screenshot from Intel mac - amd64

image

Diff screenshot

image

Diff gif

Screen Recording 2022-05-03 at 07 27 31 AM


So my questions is, does anyone have a good strategy to avoid the above errors on these two architectures without reducing the threshold?

I've tried running docker with --platform=linux/amd64 on my M1 mac, but I run into https://github.com/microsoft/playwright/issues/13724#issuecomment-1112358113 when running the tests, even on the latest docker version (v20.10.8) with Rosetta 2 installed. Sounds like this could just be a known issue with docker.

markov00 commented 1 year ago

Hei @aslushnikov @nickofthyme few things here I've noticed:

Actually that "red dot" is a dash of the red dashed line, it is really strange that in one architecture it renders that and on the other not. Looks like a different algorithm to me that in x86 renders only full dashes, were in ARM always renders them and cut to the edges if needed.

The layout differences are possible only due to a difference in how text metrics are computed in SVG. I believe that this is something that can't be fixed with a comparator and is only specific to the browser underlying implementation. I already tested multiple time how different browsers handle differently the measurements of the same text with the same font and I strongly believe this is the reason for these failures.

I also noticed different text ligature of arab language across charts, need more investigation here.

The different color icon: strange fact here, I will investigate more, but testing with two real machines (x84 and arm) same browser, same OS, they render correctly with the same svg fill color.

gajus commented 11 months ago

@aslushnikov It would be nice if this option was added behind experimental option or some other way that does not require disabling linting.

await expect(page).toHaveScreenshot({
-  // https://github.com/microsoft/playwright/issues/20097#issuecomment-1382672908
-  // @ts-expect-error experimental feature
-  _comparator: 'ssim-cie94',
+ experimental: { comparator: 'ssim-cie94', }
  clip: box,
  fullPage: true,
});
sweetcv commented 9 months ago

Hey folks, we're using the experimental { comparator: 'ssim-cie94' } to address some inconsistency with the Wbkit visual testing but faced an issue on Firefox Desktop (only) we can't handle. The comparator gives us a false comparison of these images.

dashboard-1-actual dashboard-1-expected dashboard-1-diff

UPD1: We handled the difference on the box-shadow but there's still a comparison error giving this diff for the logo.

image