Discussion - Githubissues

foolip commented 7 years ago

Filing an issue to have the discussion in public.

Which was the test that this flakiness was seen on? In IRC I see http://s92097608.onlinehome.us/tmp/17.02.17-flaky-reftest.html, is it http://w3c-test.org/html/rendering/replaced-elements/attributes-for-embedded-content-and-images/img-dim.html that you minified?

Looks like https://github.com/w3c/wptrunner/blob/master/wptrunner/executors/reftest-wait_webdriver.js is the "wait" script and that this test doesn't use the "reftest-wait" mechanism.

I don't know WebDriver, but are the sequence of commands https://github.com/w3c/wptrunner/blob/38435bc6714ae83bbf759b04395fe13f08388396/wptrunner/executors/executorselenium.py#L253-L264 intended to wait until after the load event? (The load event on the window object is delayed until images are loaded.)

How/where is webdriver.get(url) implemented?

Even given the note in https://github.com/jugglinmike/chrome-screenshot-race/blob/master/LICENSE my guess is that there isn't anything that really guarantees the timing?

@jgraham

foolip commented 7 years ago

@samuong for ChromeDriver. Is it correct to assume that it's perfectly possible and valid to use ChromeDriver to execute async scripts before the page has finished loading? https://w3c.github.io/webdriver/webdriver-spec.html has things like "pageLoadStrategy", but this is all unfamiliar to me so I don't know where to start :)

jgraham commented 7 years ago

By default it is supposed to wait for the page to finish loading. However we also check the readyState attribute (the reftest-wait script is run for each page to detect if it has the class=reftest-wait attribute on <body>). So if the screenshot is happening before the load event something is very broken somewhere. Isn't it more likely that Chrome is delaying paint until after load? Or is that not a thing that happens?

foolip commented 7 years ago

Oh, I didn't pay attention to what readyState "complete" actually means. @jugglinmike, can you confirm that the load event is in fact fired, perhaps add some bit of red that's removed in the load event? Also, can you check if the img.naturalWidth has the right value?

If it looks to scripts like everything has loaded normally and the problem still reproduces on the same runs where scripts verify this, then a repaint problem of some kind like @jgraham suggests seems likely.

jugglinmike commented 7 years ago

Sure thing, @foolip. I've extended the test script to capture more data. Here's a screenshot of the results:

2017-02-21-rendering-info

As you can see, the ready event has indeed fired and the <img> element's naturalHeight and naturalWidth report the size of the fully-rendered image. This is all true prior to capturing the screenshot for both the "fully rendered" case and the "not rendered" case.

samuong commented 7 years ago

ChromeDriver will wait for pages to load before executing both the screenshot and the execute async script commands. It then hands it off to DevTools (Page.captureScreenshot) to take the screenshot.

It's interesting that the naturalHeight/Width of the image is out of sync with what's being captured. I'll ask around within DevTools to see who can look into Page.captureScreenshot.

jugglinmike commented 7 years ago

From my local testing, this "appears to be avoidable" by forcing a repaint in the onload event handler, as in:

--- a/index.html
+++ b/index.html
@@ -1,14 +1,15 @@
 <!DOCTYPE html>
 <html>
 <head>
   <script>
 window.ONLOAD_FIRED = null;
 onload = function() {
+  document.body.style.backgroundColor = 'hotpink';
   window.ONLOAD_FIRED = 'the `onload` event fired';
 };
   </script>
 </head>
 <body style="outline: solid teal 2px;">
   <img src="blue.png" style="border-width:50px; border-style:solid;"/>
 </body>
 </html>

With that change in place, the image is rendered consistently through thousands of trials. However, even if "appears to be avoidable" were an acceptable resolution, that pattern isn't generally usable for Web Platform Tests--it's unreasonable to expect all reference tests to include the boilerplate logic just for the sake of Chromium.

A related-but-more-tenable approach would be to force that repaint in the WebDriver wait script itself:

--- a/main.py
+++ b/main.py
@@ -87,40 +87,41 @@ def take_reading(session_id):
         script = '''
             var callback = arguments[0];
             function done(strategy) {
+              document.body.style.backgroundColor = 'hotpink';
               callback({
                 callbackStrategy: strategy,
                 readyState: document.readyState,
                 scrollHeight: document.documentElement.scrollHeight,
                 onload_fired: window.ONLOAD_FIRED,
                 natural_dimensions: [
                   document.getElementsByTagName('img')[0].naturalWidth,
                   document.getElementsByTagName('img')[0].naturalHeight
                 ]
               });
             }
             if (document.readyState === 'complete') {
               done('synchronous');
             } else {
               onload = done.bind(null, 'asynchronous');
             }
         '''
         result = request('post', 'session/%s/execute_async' % session_id, dict(script=script, args=[]))

This does not solve the problem. It actually makes it more pronounced, somehow, raising further questions about the paint/repaint cycle:

screenshot from 2017-02-21 13-06-39

What we're seeing here is that the repaint is not taking effect synchronously... Or, if it is, the frame buffer is not updating before the screen shot is captured. But that may not be accurate at all; I'm really stretching my knowledge of the rendering stack at this point.

samuong commented 7 years ago

Chrome should already force a redraw before it captures a screenshot: https://cs.chromium.org/chromium/src/content/browser/renderer_host/render_widget_host_impl.cc?l=1454. I think there's something else going wrong here...

jugglinmike commented 7 years ago

@foolip Sorry, I forgot to answer your original question. This is related to that test, but I found a whole set of similar tests to behave inconsistently in Chrome:

/html/rendering/replaced-elements/attributes-for-embedded-content-and-images/img-dim.html
/html/rendering/replaced-elements/attributes-for-embedded-content-and-images/img_border_percent.xhtml
/html/rendering/replaced-elements/attributes-for-embedded-content-and-images/object_border_perc.xhtml
/html/rendering/non-replaced-elements/the-page/body_text_00ffff.xhtml
/html/rendering/non-replaced-elements/phrasing-content-0/font-element-text-decoration-color/001-a.html
/html/rendering/non-replaced-elements/phrasing-content-0/font-element-text-decoration-color/001-q.html
/html/rendering/replaced-elements/images/space.html
/html/rendering/replaced-elements/attributes-for-embedded-content-and-images/object_border_pixel.xhtml

There are other inconsistent reference tests that may be related. These do not concern rendering directly, but they do rely on image rendering:

/html/semantics/embedded-content/the-img-element/document-base-url.html
/html/semantics/embedded-content/the-video-element/video_dynamic_poster_absolute.htm
/html/semantics/forms/the-input-element/image01.html

foolip commented 7 years ago

The first experiment https://github.com/jugglinmike/chrome-screenshot-race/issues/1#issuecomment-281428002 demonstrates clearly that the load event is in fact being fired before the screenshot is captured.

However, it is perhaps not a given that the layout, even if invalidated, is actually recomputed before the screenshot is taken. While a script calling img.getBoundingClientRect() or similar would force the layout, perhaps in the absence of such a call the layout might happen on the animation frame after the load event? @eaenet, would you happen to know the sequence of events from the size of an image becoming known to the layout being updated and affecting the paint?

foolip commented 7 years ago

Chrome should already force a redraw before it captures a screenshot: https://cs.chromium.org/chromium/src/content/browser/renderer_host/render_widget_host_impl.cc?l=1454. I think there's something else going wrong here...

RenderViewImpl::OnForceRedraw(), which handles that message, calls RenderWidget::ScheduleCompositeWithForcedRedraw(). Are you sure there isn't something asynchronous about that, such that the the semantics "paint real soon now" and not "paint now"?

jugglinmike commented 7 years ago

At @rbyers's suggestion, I modified the "wait" script to defer for the next animation frame using requestAnimationFrame:

--- a/main.py
+++ b/main.py
@@ -116,7 +116,11 @@ try:
               });
             }
             if (document.readyState === 'complete') {
-              done('synchronous');
+              requestAnimationFrame(function() {
+                done('synchronous');
+              });
             } else {
               onload = done.bind(null, 'asynchronous');
}

This modification did not address the instability: I observed the race condition after 10,723 trials. Rick anticipated this, though, and explained how the semantics of the requestAnimationFrame function meant that we would actually have to apply it twice to accurately describe the condition we are interested in here:

--- a/main.py
+++ b/main.py
@@ -116,7 +116,11 @@ try:
               });
             }
             if (document.readyState === 'complete') {
-              done('synchronous');
+              requestAnimationFrame(function() {
+                requestAnimationFrame(function() {
+                  done('synchronous');
+                });
+              });
             } else {
               onload = done.bind(null, 'asynchronous');
}

This ran for almost three hours without exhibiting the race condition.

While this is encouraging, it is by no means conclusive. There is no indication that the delay introduced here is in any way related to the delay in the browser's rendering behavior. We are at the limits of what guesswork can tell us, and we really need a more authoritative voice to conclusively describe an deferral technique based on the Chromium source code.

RByers commented 7 years ago

Yeah, we'll get a Chrome rendering expert to investigate and get back to you. But in the interim I do think it's a reasonable workaround, IIRC some blink LayoutTests follow such a double-rAF pattern already, eg. here.

We recommend a different internal API which probably has stronger guarantees than just a double-rAF. Eg. it may be possible for rAF to fire while raster is still pending, so you could possible see "checkerboarding" (which today is just white, not an actual checkerboard pattern). But I suspect the WebDriver screenshot API blocks on raster (at least it should - probably doesn't make to generate a screenshot of something that hasn't finished rastering yet).

foolip commented 7 years ago

@chrishtr, can you help us take a look here? I would be interesting to start from the question I posed to @eaenet in https://github.com/jugglinmike/chrome-screenshot-race/issues/1#issuecomment-281908124 about the order of events here.

In Document::checkLoadEventSoon() I see that there's just a timer that's started to possibly fire the load event. ImageLoader seems to use that (indirectly), and if that timer is started at the same time as the layout is invalidated, then I don't think there's any guarantees about order, the load event can come before the layout. Could the paint that we use for the screenshot happen between the load event and the layout? Or do we never paint anything if the layout is dirty?

chrishtr commented 7 years ago

Blink never paints with dirty layout.

I'm trying to repro locally now.

chrishtr commented 7 years ago

I am not yet able to reproduce this bug. I tried the steps in https://github.com/jugglinmike/chrome-screenshot-race/blob/master/readme.md, but when I run

python main.py

it errors out with:

Traceback (most recent call last): File "main.py", line 91, in session_id = request('post', 'session', dict(desiredCapabilities={}))['sessionId'] File "main.py", line 20, in request response = getattr(requests, method)('http://localhost:4444/%s' % uri, json=data) File "/usr/lib/python2.7/dist-packages/requests/api.py", line 88, in post return request('post', url, data=data, kwargs) File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request return session.request(method=method, url=url, kwargs) TypeError: request() got an unexpected keyword argument 'json'

As far as I can tell I have installed the "requests" module correctly.

I also tried putting https://github.com/jugglinmike/chrome-screenshot-race/blob/master/index.html into the Blink Layout Test machinery, and was unable to make it fail in 1000 trials. (not that layout tests are the same as Chrome Driver, just one data point).

foolip commented 7 years ago

@jugglinmike, just to ensure that we're not chasing after something that was recently fixed, can you try to repro also in Chrome Canary, and if successful note the exact version in case someone wants to try building and testing from the same source?

jugglinmike commented 7 years ago

Thanks for taking a look, @chrishtr!

As far as I can tell I have installed the "requests" module correctly.

I'm not sure what could be going wrong here, either. I've just removed and re-installed that module and the script still functions on my end. To be safe, I've updated the installation instructions to use the PIP tool and provided a requirements.txt file specifying the exact version of requests that I am using. You probably don't have to use PIP, but could you verify that the version of the library matches?

@jugglinmike, just to ensure that we're not chasing after something that was recently fixed, can you try to repro also in Chrome Canary, and if successful note the exact version in case someone wants to try building and testing from the same source?

That's a good idea, Philip. I'm unable to run Chrome Canary (no suitable Windows machine), but I've verified using Chrome's latest "dev" release, (58.0.3026.3) which appears to have been published last Wednesday. I've updated the project's readme.md file to include this information along with additional details about my system architecture and version of ChromeDriver (2.27.440175, which appears to be the latest at present).

chrishtr commented 7 years ago

I can now get it to run with the updated instructions. Should I expect the test to automatically fail with a wrong screenshot? So far I haven't seen that. How long should it take?

jugglinmike commented 7 years ago

The script should capture an initial screenshot and then begin reloading the page in quick succession, capturing another screenshot each time, until an abberation is detected. It should then write a report to a file named results.html and exit.

Here are the results of my last three executions (taken today):

Captured 439 screen shots over 96.2317750454 seconds.
Captured 251 screen shots over 56.5082240105 seconds.
Captured 315 screen shots over 68.7743799686 seconds.

This problem may be related to available system resources. In case that information helps:

Processor     : 4x AMD Athlon(tm) II X4 640 Processor
Memory        : 8171MB

chrishtr commented 7 years ago

Ok now I can get it to fail. Next I'm having trouble setting a custom Chrome binary, can you help with that? I have found python code to set up a new driver, but your code is accessing via HTTP not the python library it appears.

jugglinmike commented 7 years ago

Sure thing. I just pushed an update that introduces a new command-line flag, --chrome-binary. The invocation should now look something like this:

$ python main.py --chrome-binary /custom/path/to/chrome

You can also use the new --help flag to verify.

jugglinmike commented 7 years ago

I've just verified that this behavior persists in a more recent build of Chrome: 58.0.3029.6, released yesterday.

jugglinmike commented 7 years ago

Hi @chrishtr, is the new --chrome-binary flag working for you?

chrishtr commented 7 years ago

Hi,

Yes it worked. Sorry for the delay. I still haven't found time to debug further. This week I got caught up in some difficult other work. Hope to test on Monday.

jugglinmike commented 7 years ago

Got it. Just making sure it's still on your radar!

chrishtr commented 7 years ago

Apologies, I still haven't found time to debug this in any depth this week. I hope next week.

jugglinmike commented 7 years ago

Got it. Thanks for keeping us posted

chrishtr commented 7 years ago

FYI a similar/same bug got reported via a Google-internal channel, and a bisect to find the cause of that issue is underway...it may be that Chrome 56 regressed something. If the bisect fails, I'll try to bisect this issue directly next week, since I was able to reproduce it locally.

jugglinmike commented 7 years ago

Excellent! I'm excited to learn more :)

chrishtr commented 7 years ago

For a workaround in the meantime, the double-rAF method mentioned in a comment on Feb 23 sounds like it would work (and I agree it should help). Does it?

RByers commented 7 years ago

Yeah given that there are a bunch of questions about CSS test instability potentially impacted by this issue, I suggest we temporarily land the double-rAF workaround while we continue to investigate. @jugglinmike @gsnedders WDYT?

gsnedders commented 7 years ago

@jgraham @jugglinmike we should try and land that in wptrunner ASAP

jugglinmike commented 7 years ago

For a workaround in the meantime, the double-rAF method mentioned in a comment on Feb 23 sounds like it would work (and I agree it should help). Does it?

Yes, see my comment from that day. As I mentioned, there, though, without some concrete justification, that technique is no better than (for instance) setTimeout with a value of 100 milliseconds.

Yeah given that there are a bunch of questions about CSS test instability potentially impacted by this issue, I suggest we temporarily land the double-rAF workaround while we continue to investigate. @jugglinmike @gsnedders WDYT?

@jgraham @jugglinmike we should try and land that in wptrunner ASAP

I'm sensing some urgency here. I'm concerned that implementing an unsubstantiated workaround will not guarantee stability but that it will delay a verifiable solution still further. I doubt this is going to make me any friends, but an alternative would be to mark WPT's Chrome job as an "allowed failure" until this can be resolved.

gsnedders commented 7 years ago

@jugglinmike It's not unsubstantiated insofar as it forces a paint and guarantees a paint has occurred, though it is a hypothesis that it is failing to paint the image… I don't think this will get deprioritised as a result of the workaround landing (as it likely effects far more than just this), hence I don't think that's a reason not to land this.

Ideally, if we did make it an allowed failure, we'd do so for reftests only (which probably means in check_stability.py rather than at a Travis level).

jugglinmike commented 7 years ago

@chrishtr @RByers Can the internal bug mentioned above be made public? @jgraham has pointed out that it would make a much better reference point for documenting this workaround.

chrishtr commented 7 years ago

It cannot unfortunately, but I have filed this bug for you to use as a reference point instead:

https://bugs.chromium.org/p/chromium/issues/detail?id=708757

jugglinmike commented 7 years ago

Thank you, @chrishtr!

I discussed this with @gsnedders and @jgraham on IRC, and we're going to move forward with the requestAnimationFrame workaround for the short-term.

gsnedders commented 7 years ago

Note that this shouldn't de-prioritise the upstream issue, at least until we know that the workaround (i.e., the double requestAnimationFrame) will guarantee we won't hit the underlying bug. Of course, it's likely to affect more than just us, so it probably still matters for web developers generally.

Svennisen commented 7 years ago

@jugglinmike @gsnedders @chrishtr I think I am encountering this exact issue when updating the DOM and trying to take a capture after.

Using a mutation observer and the requestAnimationFrame fix works when I run my app under normal load. However when putting very high stress on CPU and GPU the capture will still run before the render has finished. Using chrome 58 in electron. We tested this because we have customers running on low end computers where this is an issue when their computers run virus scans.

Did you try your test under heavy CPU and GPU load? I used this CPU stress program to put load on the CPU (windows). Im fairly certain your double requestAnimationFrame solution for readyState complete will also fail when doing this.

Any progress on the internal bug mentioned related to this?

chrishtr commented 7 years ago

Please comment on crbug.com/708757. I don't think there has been any progress recently on it.

Svennisen commented 7 years ago

@chrishtr I just did. Good news! They are saying it is fixed and will be released in v60. Don't have access to that issue though so can't look into it further.

"I thought this issue was fixed, maybe. https://bugs.chromium.org/p/chromium/issues/detailid=382170 That would be in M-60 and will not be merged back. So going to Stable soon."

So I am just waiting and hoping for the best.

jugglinmike / chrome-screenshot-race

Discussion #1