web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
5k stars 3.09k forks source link

Firefox: false negatives Docker container #14485

Open jugglinmike opened 5 years ago

jugglinmike commented 5 years ago

False negatives in Firefox/Docker container

Some tests fail in Firefox only when executed within a Docker container. For example

$ xvfb-run --auto-servernum \
    ./wpt run --no-headless --yes --channel nightly --install-browser --log-mach - --log-mach-level DEBUG firefox css/selectors/focus-within-0*

Reports:

web-platform-test
~~~~~~~~~~~~~~~~~
Ran 22 checks (10 tests, 12 subtests)
Expected results: 22

While

$ docker run -it --privileged harjgam/web-platform-tests:0.22 \
    /bin/bash -c './start.sh "" "" "" firefox; \
      cd web-platform-tests;
      ./wpt run --no-headless --yes --channel nightly --install-browser --log-mach - --log-mach-level DEBUG firefox css/selectors/focus-within-0*'

Reports:

web-platform-test
~~~~~~~~~~~~~~~~~
Ran 22 checks (10 tests, 12 subtests)
Expected results: 13
Unexpected results: 9
  test: 9 (9 fail)

This is apparent on the results published on wpt.fyi, as well (using WPT commit 5349c6a632):

All of these tests use the reftest-wait class to signal readiness. The failure logs include Base64-encoded images which indicate that the screenshot is being taken before that class is removed.

I haven't been able to debug this very far because Firefox does not use the in-tree "wait" script designed for Marionette but instead an alternate implementation which is bundled into the browser. I suspect this effects many more tests, but it's difficult to say without a better understanding of the problem.

@jgraham @whimboo would you mind taking a look?

Hexcles commented 5 years ago

There are some discrepencies between the two examples other than Docker: the first one used xvfb-run without specifying the screen spec (resolution & color depth). Also, the second one used a slightly older version of the image.

Although these differences don't seem crucial, Mike, could you try to eliminate the other variables as much as possible?

jugglinmike commented 5 years ago

Couldn't hurt, @Hexcles. Here's a more authentic version of the "bare metal" command:

export SCREEN_WIDTH=1280
export SCREEN_HEIGHT=1024
export SCREEN_DEPTH=24
export DISPLAY=:99.0
sudo Xvfb $DISPLAY -screen 0 ${SCREEN_WIDTH}x${SCREEN_HEIGHT}x${SCREEN_DEPTH} &
xvfb-run --auto-servernum \
  ./wpt run --no-headless --yes --channel nightly --install-browser --log-mach - --log-mach-level DEBUG firefox css/selectors/focus-within-0*

And here's the corresponding Docker command:

docker run -it --privileged harjgam/web-platform-tests:0.25 \
  /bin/bash -c './start.sh "" "" "" firefox; \
    cd web-platform-tests;
    ./wpt run --no-headless --yes --channel nightly --install-browser --log-mach - --log-mach-level DEBUG firefox css/selectors/focus-within-0*'

These changes did not affect the test results in either environment. Would you mind verifying from your machine?

whimboo commented 5 years ago

Sorry but I don't know the reftest part of Marionette. @jgraham and @gsnedders might be the best persons to talk to at the moment.

jgraham commented 5 years ago

https://searchfox.org/mozilla-central/source/testing/marionette/listener.js#1641 is waiting for the reftest-wait to be removed, and after that we check if there are pending paints before returning. So the bug is at least not trivial.

I strongly suspect that the problem here is actually related to focus. https://hg.mozilla.org/try/raw-rev/91724447c60cefbf672e607ecb91fe4070a4de52 is an attempt to ensure that the focus is correctly set, although I'm not sure what's there is enough. I don't know if you can repeat your test with that patch applied?

gsnedders commented 5 years ago

Can you use the tpbl logger so we get screenshots logged? Like @jgraham I suspect this is just the racy focus bug again.