microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
66.52k stars 3.64k forks source link

[REGRESSION]: Webkit timer become inaccurate since playwright build 1943 #28567

Closed nekolab closed 10 months ago

nekolab commented 10 months ago

Context:

Code Snippet Please clone this repo and follow README to reproduce the bug: https://github.com/nekolab/pw-webkit-timer-bug-demo

Describe the bug We have encountered test failures in our internal CI pipeline following the update to Playwright 1.40.0, persisting in the latest 1.41.1 version. The issue specifically affects Webkit browsers and is reproducible only on Ubuntu builds (macOS builds function correctly, and it has not been tested on Windows).

Upon investigation, we identified that the root cause is linked to on-page timers (e.g., those scheduled by setTimeout) being inaccurate. In extreme cases, a timer set for 1 second on the page was observed to take up to 8 seconds in real time. This discrepancy leads to timeouts in expect calls, resulting in test failures.

Tracing the regression to a specific part of the tested page that might be causing the timer irregularity has proven challenging. We conducted extensive internal testing with various page combinations to provide a minimal reproducible example but found it difficult to isolate a single factor that affects the timer.

Our current findings on the conditions for reproducing the issue are as follows:

Please refer to the readme in the demo repo for more details. Attached a snippet of the log when the issue occurs below. As you can see, the timer behavior is extremely inaccurate.

Running 1 test using 1 worker

     1 [webkit] › filter.spec.ts:3:1 › test filter
[1702205785482] On page reference one second tick: 1000
=====================================
[1702205786093] filterQuery changed to: name=foo
[1702205786093] Start on page reference timer
[1702205786598] On page reference one second tick: 1116         <===== See here
[1702205788346] On page reference timer 1000ms = 2253ms         <===== See here
[1702205788347] On page reference one second tick: 1749         <===== See here
[1702205789393] On page reference one second tick: 1046
[1702205790210] filterQuery changed to: 
[1702205790210] Start on page reference timer
[1702205790416] On page reference one second tick: 1023
[1702205791212] On page reference timer 1000ms = 1002ms
[1702205791416] On page reference one second tick: 1000
[1702205792420] On page reference one second tick: 1004
mxschmitt commented 10 months ago

Thanks for the great bug and Playwright bisect. I bisected it down inside WebKit:

PW RANGE: https://github.com/microsoft/playwright-browsers/commit/7365d87ed9170a1f7009fa2ed358adce8b8d7b59 WK RANGE: https://github.com/webkit/webkit/compare/4c22d8ce00c4a3eaf75085b3fbf5f1ebe1ebf8fd...e5bf2b80f8e57e5186df3c301de3259c6e1e812c -> WK COMMIT 4c22d8ce00c4a3eaf75085b3fbf5f1ebe1ebf8fd GOOD -> WK COMMIT e5bf2b80f8e57e5186df3c301de3259c6e1e812c BAD WK COMMIT which caused it: https://github.com/WebKit/WebKit/commit/2bf118e5bf78aaccfb3dcb3de85a7a84ad0a6279

mxschmitt commented 10 months ago

I can confirm that it seems fixed in 1956 (https://github.com/microsoft/playwright/pull/28678) after it got reverted upstream in https://github.com/WebKit/WebKit/commit/ffce61a305af851682f8f8d9d6e2f8a54c610fb2. We'll release it as part of 1.41.0

carlosgcampos commented 7 months ago

We are re-considering the patch to wait for the rendering in the main thread, instead of the compositing thread, but we don't want to break this again. We think that maybe the run loop source priorities can affect this, so would it be possible that you try again with https://github.com/WebKit/WebKit/pull/26181 and https://github.com/WebKit/WebKit/pull/26192?

yury-s commented 7 months ago

@nekolab can you restore the repro from https://github.com/nekolab/pw-webkit-timer-bug-demo so that we could try it with the new changes that Carlos mentioned above?

nekolab commented 7 months ago

Hi @yury-s The repo is unarchived

yury-s commented 7 months ago

@nekolab thanks! It looks like the deps are broken in the repo now, do you know if there is an easy fix?

$ npm i 
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: pw-webkit-ubuntu-bad-timer@0.0.0
npm ERR! Found: @kong/kongponents@9.0.0-pr.1804.26905eed.0
npm ERR! node_modules/@kong/kongponents
npm ERR!   dev @kong/kongponents@"9.0.0-pr.1804.26905eed.0" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer @kong/kongponents@"9.0.0-alpha.58" from @kong-ui-public/app-layout@2.1.3
npm ERR! node_modules/@kong-ui-public/app-layout
npm ERR!   dev @kong-ui-public/app-layout@"2.1.3" from the root project
npm ERR! 

Update: running

  1. yarn
  2. yarn dev --force also fails
nekolab commented 7 months ago

@yury-s Ahh it seems like the dependency is gone. I've updated the repo to match the latest package. Please try again and let me know if it helps

yury-s commented 7 months ago

@carlosgcampos just ran the test (bumped toHaveText timeout to 15s) with the two patches applied (see https://github.com/yury-s/WebKit/tree/timer-experiment). The timer related output seems fine:

Running 1 test using 1 worker

  ✘  1 [webkit] › filter.spec.ts:10:1 › test filter (17.1s)
  pw:browser <launching> /home/yurys/playwright-browsers/browser_patches/webkit/pw_run.sh --inspector-pipe --headless --no-startup-window +0ms
  pw:browser <launched> pid=559427 +9ms
=====================================
[1711399507355] filterQuery changed to: name=foo
[1711399507355] Start on page reference timer
[1711399507421] On page reference one second tick: 1001
[1711399508370] On page reference timer 1000ms = 1015ms
[1711399508421] On page reference one second tick: 1000
[1711399509423] On page reference one second tick: 1002
[1711399510423] On page reference one second tick: 1000
[1711399511424] On page reference one second tick: 1001
[1711399512424] On page reference one second tick: 1000
[1711399513425] On page reference one second tick: 1001
[1711399514426] On page reference one second tick: 1001
[1711399515426] On page reference one second tick: 1000
[1711399516427] On page reference one second tick: 1001
[1711399517428] On page reference one second tick: 1001
[1711399518429] On page reference one second tick: 1001
[1711399519429] On page reference one second tick: 1000
[1711399520430] On page reference one second tick: 1001
[1711399521430] On page reference one second tick: 1000
[1711399522434] On page reference one second tick: 1004
  pw:browser [pid=559427] <gracefully close start> +17s
  pw:browser [pid=559427] <process did exit: exitCode=0, signal=null> +72ms
  pw:browser [pid=559427] starting temporary directories cleanup +1ms
  pw:browser [pid=559427] finished temporary directories cleanup +2ms
  pw:browser [pid=559427] <gracefully close end> +0ms
carlosgcampos commented 6 months ago

Thanks!