web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
5.01k stars 3.11k forks source link

Investigate webdriver failures on wpt-chrome-dev-stability action #38450

Open jcscottiii opened 1 year ago

jcscottiii commented 1 year ago

Background

This PR was stuck for awhile because there was a failure on the wpt-chrome-dev-stability GitHub action. The PR changed some webdriver test files.

The errors seen in the PR include:

Logs 1 (click to expand/collapse) ``` ERROR test_no_top_browsing_context - setup error: webdriver.error.UnknownErrorException: unknown error (500): unknown error: Chrome failed to start: crashed. ```
Logs 2 (click to expand/collapse) ``` 79:42.61 INFO STDOUT: E webdriver.error.WebDriverException: tab crashed (500): tab crashed 79:42.61 INFO STDOUT: E (Session info: chrome=111.0.5562.0) 79:42.61 INFO STDOUT: E ```
Logs 3 (click to expand/collapse) ``` 1:14.81 TEST_END: Test OK. Subtests passed 10/11. Unexpected 1 FAIL test_cross_origin[capabilities0] - webdriver.error.StaleElementReferenceException: stale element reference (404): stale element reference: element is not attached to the page document session = url = .url at 0x7efe191543a0> @pytest.mark.capabilities({"acceptInsecureCerts": True}) def test_cross_origin(session, url): base_path = ("/webdriver/tests/support/html/subframe.html" + "?pipe=header(Cross-Origin-Opener-Policy,same-origin") first_page = url(base_path, protocol="https") second_page = url(base_path, protocol="https", domain="alt") session.url = first_page session.url = second_page elem = session.find.css("#delete", all=False) response = back(session) assert_success(response) assert session.url == first_page with pytest.raises(error.NoSuchElementException): > elem.click() base_path = '/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin' elem = first_page = 'https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin' response = <[ValueError('Sign not allowed in string format specifier') raised in repr()] Response object at 0x7efe19090070> second_page = 'https://not-web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin' session = url = .url at 0x7efe191543a0> webdriver/tests/back/back.py:168: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tools/webdriver/webdriver/client.py:22: in inner return func(self, *args, **kwargs) args = () func = kwargs = {} self = session = tools/webdriver/webdriver/client.py:845: in click self.send_element_command("POST", "click", {}) self = tools/webdriver/webdriver/client.py:835: in send_element_command return self.session.send_session_command(method, url, body) body = {} method = 'POST' self = uri = 'click' url = 'element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click' tools/webdriver/webdriver/client.py:661: in send_session_command return self.send_command(method, url, body, timeout) body = {} method = 'POST' self = timeout = None uri = 'element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click' url = 'session/3ef0c79b43712a6fbfb8ac9f25035771/element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click' _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , method = 'POST' url = 'session/3ef0c79b43712a6fbfb8ac9f25035771/element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click' body = {}, timeout = None def send_command(self, method, url, body=None, timeout=None): """ Send a command to the remote end and validate its success. :param method: HTTP method to use in request. :param uri: "Command part" of the HTTP request URL, e.g. `window/rect`. :param body: Optional body of the HTTP request. :return: `None` if the HTTP response body was empty, otherwise the `value` field returned after parsing the response body as JSON. :raises error.WebDriverException: If the remote end returns an error. :raises ValueError: If the response body does not contain a `value` key. """ response = self.transport.send( method, url, body, encoder=protocol.Encoder, decoder=protocol.Decoder, session=self, timeout=timeout) if response.status != 200: err = error.from_response(response) if isinstance(err, error.InvalidSessionIdException): # The driver could have already been deleted the session. self.session_id = None > raise err E webdriver.error.StaleElementReferenceException: stale element reference (404): stale element reference: element is not attached to the page document E (Session info: chrome=111.0.5562.0) E E Remote-end stacktrace: E E #0 0x55bed92e4152 E #1 0x55bed926ced3 E #2 0x55bed8ff4741 E #3 0x55bed8ff8009 E #4 0x55bed8ff7d1a E #5 0x55bed8ff8087 E #6 0x55bed902e930 E #7 0x55bed9024093 E #8 0x55bed90536c2 E #9 0x55bed9023b42 E #10 0x55bed905388e E #11 0x55bed906b0fe E #12 0x55bed9053463 E #13 0x55bed9021fd2 E #14 0x55bed902320c E #15 0x55bed92a3d6b E #16 0x55bed92ba7a4 E #17 0x55bed92ba05f E #18 0x55bed92baf55 E #19 0x55bed92a5c73 E #20 0x55bed92bb2db E #21 0x55bed92957c7 E #22 0x55bed92d85a8 E #23 0x55bed92d86eb E #24 0x55bed92f47f6 E #25 0x7fb931998609 start_thread E #26 0x7fb93143c133 clone ```
Logs 4 (click to expand/collapse) ``` setup error: webdriver.error.UnknownErrorException: unknown error (500): unknown error: Chrome failed to start: crashed ```

To prove that it was unrelated, we created a PR that only touched the whitespace in the same files. From there, we could conclude that it was safe to merge since the same errors came up.

Risk of not resolving

Initial Hypotheses

  1. Something is wrong with a dependency in the Docker container.
    • Evidence: @nechaev-chromium and @sadym-chromium were unable to reproduce the error locally. [1] [2]
  2. There might be a real error in Chrome and webdriver
    • Evidence: TBD
thiagowfx commented 1 year ago

I cannot submit several of my PRs because of this blocker. cc @foolip @jgraham (who force-merged previously) could you help investigate this?

Example PR: https://github.com/web-platform-tests/wpt/pull/40470 Logs: https://github.com/web-platform-tests/wpt/pull/40470/checks?check_run_id=14136304439

Another example PR: https://github.com/web-platform-tests/wpt/pull/40421 Logs: https://github.com/web-platform-tests/wpt/pull/40421/checks?check_run_id=14132619502

The failures are all related to webdriver classic, which are all unrelated to both PRs. You can confirm this by grepping for "FAIL" in the logs: all failures are pertaining /webdriver/tests/classic.

thiagowfx commented 1 year ago

Actually, the error messages ask to tag a group instead of individuals, so let's do that:

These may be pre-existing or new flakes. Please try to reproduce (see the above WPT command, though some flags may not be needed when running locally) and determine if your change introduced the flake. If you are unable to reproduce the problem, please tag @web-platform-tests/wpt-core-team in a comment for help.

These may be pre-existing or newly slow tests. Slow tests indicate that a test ran very close to the test timeout limit and so may become TIMEOUT-flaky in the future. Consider speeding up the test or breaking it into multiple tests. For help, please tag @web-platform-tests/wpt-core-team in a comment.

cc @web-platform-tests/wpt-core-team

thiagowfx commented 1 year ago

I believe I understand the pattern. Whenever files in /webdriver/tests/support are touched, the CI fails because of the current pre-existing failures.

whimboo commented 1 year ago

Now with https://github.com/web-platform-tests/rfcs/pull/131 merged what does it mean for those jobs? Do those changes have to be applied now?

jcscottiii commented 1 year ago

Hey @whimboo! The next steps would be to implement that RFC. We are currently prioritizing our work. Once we have a timeline for that work we will comment on that RFC.

thiagowfx commented 1 year ago

@nechaev-chromium I believe you may have fixed some of those issues, with https://chromium-review.googlesource.com/c/chromium/src/+/4675633

I fixed some of those with https://github.com/web-platform-tests/wpt/pull/40887

thiagowfx commented 1 year ago

See also: https://github.com/web-platform-tests/wpt/issues/40990

whimboo commented 1 year ago

I was out for 3 weeks. @thiagowfx are those jobs are more stable nowadays? In case they still fail often what else might be left to do? It's at least good to see that this crash has been fixed!

thiagowfx commented 1 year ago

Splitting the tests was overall helpful. They are more stable, but not completely. https://github.com/web-platform-tests/wpt/issues/41083 also needs to be fixed.

nechaev-chromium commented 1 year ago

We have fixed two causes of ConnectionRefusedError. The fixes must be available since 117.0.5915.x

whimboo commented 1 year ago

Is there any work left to do? Recently it looks pretty good around this job. Through I'm not sure how often it still fails for PRs and landings that I don't watch.

thiagowfx commented 1 year ago

The question we should ask ourselves is: Do we still require admin merges to bypass this? If yes, then there's still work to do. I haven't merged any non-trivial PRs recently, @Lightning00Blade @OrKoN what's your experience in the last few weeks?

whimboo commented 1 year ago

FYI we have had a quite good experience lately with admin merge requests when specifically asking the web-platform-tests/admins team directly. Some person should always be around for help.

past commented 1 year ago

Note that #40990 was fixed, so the wpt-chrome-dev-stability check should no longer block any PR.

whimboo commented 10 months ago

Note that there is also a bug in Chrome which causes an extra 100ms delay when trying to resize or re-position a window. With https://github.com/web-platform-tests/wpt/pull/43853 I'm going to add a workaround until it's fixed.

With this PR landed the chrome wdspec tests will drastically speed-up.