Clarify reftest screenshot timing in docs (e.g. the double rAF)

Hexcles commented 5 years ago

Here's our current documentation regarding the default reftest timing (i.e. without reftest-wait) https://web-platform-tests.org/writing-tests/reftests.html#controlling-when-comparison-occurs:

By default reftest screenshots are taken after the load event has fired, and web fonts (if any) are loaded. [...] Note that in neither case is exact timing of the screenshot guaranteed: it is only guaranteed to be after those events.

This is perhaps too vague and I have had a few people asking about more detailed guarantees regarding layout, rasterization, etc. I'm not a layout expert and I don't know if the load event guarantees any of these in the specs. Yet I find folks usually satisfied when they see the internal "temporary" workaround in wptrunner that waits for 2 requestAnimationFrames (otherwise they would want to add a double-rAF at the end of the page themselves).

The question is: does the double-rAF workaround provide any additional timing guarantees to load? If so, then many tests have probably relied on these guarantees since then, and we should either codify the additional timing requirements in docs, or remove this "temporary" workaround and fix the broken tests.

gsnedders commented 5 years ago

This is perhaps too vague and I have had a few people asking about more detailed guarantees regarding layout, rasterization, etc.

There are limits to what we can control over WebDriver and JS.

With regards to rAF, see https://html.spec.whatwg.org/#update-the-rendering. While I believe we can guarantee layout will have occurred, we can't guarantee that it has been rasterized (as that's not observable; it just so happens in the Blink case that the double-rAF guarantees it).

Hexcles commented 5 years ago

For a reftest, wouldn't waiting until layout is observable a desirable default? Even if we can't guarantee it, it's usually the case in practice. Shall we say we try to make sure layout is observable, and discourage test authors from writing double-rAF simply for this purpose themselves?

gsnedders commented 5 years ago

Yeah, we should probably just make it obvious that the intention is very much to guarantee it, nothing else makes sense…

guest271314 commented 5 years ago

@Hexcles What is the test trying to achieve?

Hexcles commented 5 years ago

@guest271314 fairly simple: set the layout via CSS, change the layout via JavaScript in body (before load), and hope the screenshot would capture the new layout. The current docs only say that we wait for load which would guarantee that the script has completed and the relayout has happened, but IIUC technically there's no guarantee whether it's observable/rasterized.

This is such a simple and common use case. And as @gsnedders said, the only sensible thing to say is that we try to guarantee the frame update has happened in practice and tests don't need to do anything special (like the double-rAF hack that isn't even standard).

guest271314 commented 5 years ago

@Hexcles Which load event is being referenced? window, document.body, an img element?

How is the layout changed in JavaScript? Directly at the styleSheet, using style property of an HTMLElement, or changing text within a <style> element?

Is this test and context related to a headless browser taking a screenshot of a requested document?

There are several approaches which can be used in conjunction to "track" changes made to CSS.

If the context is comparing the actual pixels of a screenshot of a document before and after a change in the actual styleSheet directly, and the presumptive change is different from the original screenshot, a screenshot can be taken before and after the JavaScript code which changes the layout and each pixel can be compared using <canvas> or OffscreenCanvas. Taken a step further, the document can be captured and drawn continuously exactly before JavaScript is run and up until the pixels do change.
If the context is determining if the resource is requested at all which will change the visual layout PerformanceObserver can be used.
If the context is determining the duration between the JavaScript being run and the layout change occurring, a combination of 1) and 2) can be used.
If the context does not involve updating the layout with an image programmatically using JavaScript the styleSheet itself can be queried for changes.

What exactly is the context? Determine if a) the resource is requested at all; b) if layout is changed; c) exactly or approximately when the layout is changed?

Re the linked test, is the load event of the <link> element used at all? Or only document.body or window load event?

Why is a screenshot used for the test? The example at the linked document is testing whether an HTML element is appended to the document or not.

A timer is not necessary to achieve the expected result if the requirement is not to determine the duration between JavaScript changing layout and the actual layout changing in the document. An async iterator, ReadableStream WritableStream pair, MutationObserver, PerformanceObserver and/or combination of MutationObservers can be used to read the pixels until one or more expected changes occur.

guest271314 commented 5 years ago

@Hexcles

fairly simple: set the layout via CSS, change the layout via JavaScript in body (before load),

Can you link to the actual test which raised this issue?

Some similar inquiries and related code that might be helpful (can include the code here if necessary)

PerformanceObserver

Get actual image url after redirect

const key = "unsplash";

const url = `https://source.${key}.com/random/2560x1440`;

let bloburl = void 0;

let img = new Image;

const getResourceName = () => {

  let resource = "";

  const observer = new PerformanceObserver((list, obj) => {
    // alternatively iterate all entries, check `"name"`
    // property value for `key`, break loop if specific resource requested
    for (let entry of list.getEntries()) {
      let {name: res} = entry.toJSON();
      resource = res;
    }
  });

  observer.observe({
    entryTypes: ["resource"]
  });

  return fetch(url)
    .then(response => response.blob())
    .then(blob => {
      observer.disconnect();
      bloburl = URL.createObjectURL(blob);
      img.src = bloburl;
      document.body.appendChild(img);
      return resource
    })

}

Handling <?xml-stylesheet> similar to ?

window.onload = () => {
  let resource;
  const observer = new PerformanceObserver((list, obj) => {
    for (let entry of list.getEntries()) {
      for (let [key, prop] of Object.entries(entry.toJSON())) {
        if (key === "name") {
          resource = prop;
          var nodes = document.childNodes;
          _nodes: for (let node of nodes) {
            if (node.nodeType === 7 || node.nodeType === 8 
            && node.nodeValue === pi.nodeValue) {
              let url = node.baseURI 
                        + node.nodeValue.match(/[^href="][a-z0-9/.]+/i)[0];
              if (url === resource) {
                observer.disconnect();
                // use `setTimeout` here for
                // low RAM, busy CPU, many processes running
                let stylesheets = node.rootNode.styleSheets;
                for (let xmlstyle of stylesheets) {
                  if (xmlstyle.href === url && url === resource) {
                    let rules = (xmlstyle["cssRules"] || xmlstyle["rules"]);
                    for (let rule of rules) {
                      // do stuff
                      console.log(rule, rule.cssText, rule.style, xmlstyle);
                      break _nodes;
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  });

  observer.observe({
    entryTypes: ["resource"]
  });

  const pi = document.createProcessingInstruction('xml-stylesheet',
    'href="style.css" type="text/css"');
  document.insertBefore(pi, document.documentElement);

}

MutationObserver (observing the document)

window.onload = function() {
  let observer = new MutationObserver(function(mutations) {
    console.log(mutations)
    for (let mutation of mutations) {
      for (let node of mutation.addedNodes) {
        if (node.nodeName === "xml-stylesheet") {
          let url = node.baseURI 
                    + node.nodeValue.match(/[^href="][a-z0-9/.]+/i)[0];
          setTimeout(function() {
            for (let style of document.styleSheets) {
              if (style.href === url) {
                observer.disconnect();
                // do stuff
                console.log(style)
              }
            }
          // adjust `duration` to compensate for device
          // low RAM, busy CPU, many processes running
          }, 500)  
        }
      }
    }
  });

  observer.observe(document, {
    childList: true
  });

  const pi = document.createProcessingInstruction('xml-stylesheet',
    'href="style.css" type="text/css"');
  document.insertBefore(pi, document.documentElement);

}

Does the reftest try to address CSS rules inserted/added at a styleSheet itself?

Empty style element with working CSS rules?

<!DOCTYPE html>
<html>
  <head>
    <style id="style"></style>
  </head>
  <body>
    <div>abc</div>
    <p>123</p>
    <script>
      const style = document.querySelector("#style");
      const {sheet} = style;
      sheet.insertRule("div{color:blue}", 0);
      sheet.addRule("p", "color:green", 1);
      console.log(style.textContent);
      style.textContent = "";
      console.assert(style.textContent.length > 0, [style.textContent]); // assertion failed
      console.log(style.textContent === ""); // true
      /*  after changing rules at the styleSheet itself, check HTML or take screenshot */
    </script>
  </body>

</html>

and CSS pseudo elements?

guest271314 commented 5 years ago

@Hexcles From what gather <link rel="import"> has been deprecated in lieu of import and dynamic import(), where load event of <link> element could be used, though MutationObserver() could alternatively be used, see Is there a way to know if a link/script is still pending or has it failed; http://plnkr.co/edit/mDMGGC75JQFZR6R206qG?p=preview

I'd like to know from the html below, if link[rel=import], link[rel=stylesheet], img and script are pending/loaded/failed/aborted without the need to add listeners beforehand and at any point after the event has happened

var resourceNodeSelector = 'link[href],script[src],img[src]';
function watchResource (n) {
    var url = n.href || n.src;

    if (!n.matches || !n.matches(resourceNodeSelector)) {
        return;
    }

    if (n.status) {
        return;
    }

    n.status = resourceObserver.promises[url] = new Promise(function (resolve, reject) {
        n.addEventListener('load', resolve);
        n.addEventListener('error', reject);
        n.addEventListener('abort', reject);
        n.addEventListener('unload', function (l) { delete resourceObserver.promises[url]} );
    });
    n.status.catch(function noop () {}); //catch reject so that it doesn't cause an exception
}

var resourceObserver = new MutationObserver(function (mutations) {
    document.querySelectorAll(resourceNodeSelector).forEach(watchResource);
});
resourceObserver.promises = {};
resourceObserver.observe(window.document, {childList: true, subtree: true});

Hexcles commented 5 years ago

Re the linked test, is the load event of the element used at all? Or only document.body or window load event?

I'm a bit confused. I haven't linked to any example. Which test are you referring to?

Here's one example: https://crrev.com/c/1542305/5/third_party/blink/web_tests/external/wpt/css/css-tables/toggle-row-display-property-001.html

The author was not sure if any extra step was needed to ensure that screenshots are taken after the layout change caused by the script becomes observable, since the doc only says screenshots are taken after the load event (on window).

Also sorry where is all these comments about <link> or import from? I'm not sure we're talking about the same issue? @guest271314

guest271314 commented 5 years ago

@Hexcles

Also sorry where is all these comments about <link> or import from?

"Test File" and "Reference File" at "Example Reftests" at the link https://web-platform-tests.org/writing-tests/reftests.html#controlling-when-comparison-occurs.

No load event is used for the <link> element at the example test linked to at last comment. The simplest solution would be to await load event of the "Reference File" at <link> element, or alternatively use fetch() and .text() as a representation of the "Reference File" to avoid using load event, then use MutationObserver and take the screenshot at each iteration when style.display is set (last mutation of N elements attribute being set), or await until completion of the for loop if the expected result is for all CSS properties set are expected to be in the screenshot.

guest271314 commented 5 years ago

@Hexcles Does the test currently actually rely on requestAnimationFrame and not observing changes to the elements before taking the screenshot?

Hexcles commented 5 years ago

Hmm looks like you're not familiar with the implementation of WPT reftests.

Those <link> tags are parsed and used by our Python test runner, not the browser. The runner controls the browser (via WebDriver) to take screenshots of the test and the reference separately and compares them. That's what a reftest is in WPT.

This issue isn't about that. We are only talking about within a page (be it test or reference), what screenshot timing guarantee we provide. Take the simple example I linked to above: does window:load guarantee that the new layout is already rasterized to the screen in this case? I'm not a layout expert, but I take from various discussions that the spec doesn't say so, which causes some test authors to consider using various tricks to ensure the test is more stable or technically correct.

Hexcles commented 5 years ago

Test authors want whatever changes to the layout done before the load event (usually in a <script> tag in <body>) to be captured in the screenshot. That's the most intuitive and also matches our current implementation, but the documentation doesn't strictly say so.

guest271314 commented 5 years ago

@Hexcles Why is load event of window and requestAnimationFrame used? Or, relevant to this issue, why does the make the claim that load event of window is a "guarantee" for when to take the screenshot? Instead, for the linked example test style="display:none" can be set at the <tr> elements within the HTML, instead of a <style> element, then use MutationObserver to check if rows.length mutation events have occurred (and alternatively also check the values at the styleSheet for each specific element), then take the screenshot.

Then you can run the tests in a separate window then get the results in the current window using SharedWorker.

Hexcles commented 5 years ago

why does the make the claim that load event of window is a "guarantee" for when to take the screenshot?

That's the default of our reftest implementation. We need to define a sensible default timing for screenshots. Do you have better suggestions?

And of course, one might be able to essentially implement a reftest using a worker and a canvas without relying on the external screenshot comparison functionality provided by wptrunner, but having to implement all of that in every single reftest is not very ergonomic. Note that we already have tens of thousands of these tests.

And if we stick to the notion of taking a screenshot externally via WebDriver, we have to deal with the rasterization problem even with a MutationObserver. Does a mutation event guarantee that the change has been rasterized to the screen?

guest271314 commented 5 years ago

@Hexcles Right now requestAnimationFrame does not "guarantee" anything related to the claim made, as no check is performed to verify that imported content has loaded or the elements' have been changed in some way by JavaScript. Even when changed by JavaScript, e.g., if one were to place the code which executes screenshot after for loop, the changes still might not have been made.

You are correct, am not familiar with the server portion of wpt (an entire syllabus and test vector itself, evidently); usually run tests at file: protocol.

Do you have better suggestions?

"better" is entirely subjective. Would make the entire process wpt uses to submit tests far more simple, with a dedicated team/individuals who write tests, among other changes which could assist in making submissions of bugs and items that users determine should be tested, and their own tests, less of a learning curve of the wpt procedure, GitHub, etc.

In any event, the code below should demonstrate the issue of a presumption that the elements' style attribute is (or should be) updated following the for loop, and an example solution using MutationObserver, without any consideration of merging this example into the current wpt procedure other than the presumption that the code which executes the screenshot command can be called at any time, not based on an arbitrary timer

<!DOCTYPE html>
<html>

<head>
  <link rel="author" title="David Grogan" href="mailto:dgrogan@chromium.org">
  <link rel="help" href="https://www.w3.org/TR/CSS2/visuren.html#display-prop">
  <link rel="help" href="https://bugs.chromium.org/p/chromium/issues/detail?id=764031">
  <link rel="match" href="toggle-row-display-property-001-ref.html">
  <meta name="flags" content="" />
  <meta name="assert" content="After setting display:table-row on tr elements with display:none, the table should look same as if display:none were never used." />

  <style>
    td,
    th {
      border: 1px solid black;
    }

    tr {
      display: none;
    }
  </style>
</head>

<body>
  <table id=theTable>
    <tr style="display:none">
      <th colspan="4">This cell must span all columns for repro in chrome</th>
    </tr>
    <tr>
      <td rowspan="3" style="height:200px">A</td>
    </tr>
    <tr style="display:none">
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
    <tr style="display:none">
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
  </table>

  <script>
    theTable.offsetTop;

    let n = 0;

    let rows = theTable.querySelectorAll('tr');

    console.log(`rows.length: ${rows.length}`);

    rows.forEach(row => {

      let observer = new MutationObserver(function(mutations) {
        mutations.forEach(function(mutationRecord) {
          ++n;
          if (n === rows.length) {
            console.log(`${n} is expected to be 4 right now, actual value of ${n} right now: ${n}`);
            // take screenshot
          }
        });
      });

      observer.observe(row, {
        attributes: true,
        attributeFilter: ["style"]
      });

    });

    for (var i = 0; i < rows.length; i++) {
      rows[i].style.display = 'table-row';
    }

    console.log(`${n} is expected to be 4 right now, actual value of ${n} right now: ${n}`);
  </script>
</body>

</html>

guest271314 commented 5 years ago

@Hexcles See MutationObserver watching for style change only firing once https://stackoverflow.com/q/54639719. Note the style attribute set at the actual HTML. The linked Chromium test mixes setting style attribute at HTML and text of a <style> element, which could reflect different values in some cases; see the last question and code at this comment https://github.com/web-platform-tests/wpt/issues/16123#issuecomment-479639995, where a user was trying to "scrape" content from *eddit though changes could have been made at the actual styleSheet without necessarily being reflected at the <style> element. Even then the user still need to await the actual changes to the DOM; e.g., the code using MutationObserver could also check the styleSheet before calling <screenshot()>.

guest271314 commented 5 years ago

@Hexcles https://plnkr.co/edit/j7nx5A?p=preview

Hexcles commented 5 years ago

Right now requestAnimationFrame does not "guarantee" anything related to the claim made

No, it doesn't. It's a non-interoperable hack that works on Chrome: if you have a nested double requestAnimationFrame, the innermost callback would happen before the second animation frame and after the first animation frame, which would (on Chrome) guarantee that any previous layout change has been flushed to the frame.

The problem is that we are talking about external screenshots, not inside JavaScript. Your workaround essentially has the same problem in the sense that the MutationObserver only guarantees that JavaScript has observed the change of row.length but not that if we take a screenshot of the browser window at this precise moment we get the pixels of the new layout.

We are not talking about DOM or any JavaScript-observable properties here; we are talking about the actual pixels of the browser window. The difference between various style is off topic.

guest271314 commented 5 years ago

@Hexcles No, the different applications of style and what is reflected at the DOM is not "off-topic", given the linked test changes the style. That is, if the concern is actually determining how/when the layout changes using different means. In any event, if you omit that consideration, that is your choice.

The problem is that we are talking about external screenshots, not inside JavaScript. Your workaround essentially has the same problem in the sense that the MutationObserver only guarantees that JavaScript has observed the change of row.length but not that if we take a screenshot of the browser window at this precise moment we get the pixels of the new layout.

You can take a screenshot of the expected HTML and take screenshots until the pixels are different, then send that last screenshot, or if you insist on using an arbitrary timer, fail the test after N seconds. queueMicrotask is dispatched before requestAnimationFrame, though none of that guarantees the changes have been made to the DOM, hence this issue.

Still not sure why a screenshot is being used to observe differences to the DOM? Is this test to determine the duration between the JavaScript code which changes the style of an element and when the styles are reflected in the DOM?

You could alternatively use CSS animations or Web Animations API (an arbitrary timer) then take screenshot at animationend event or onfinish.

Hexcles commented 5 years ago

Still not sure why a screenshot is being used to observe differences to the DOM? Is this test to determine the duration between the JavaScript code which changes the style of an element and when the styles are reflected in the DOM?

This is the key question. Fundamentally, if you only rely on JavaScript to assert DOM properties, you'd never be able to test the rendering stack of a browser. Reftests are end-to-end, blackbox tests that cover both the layout calculation and graphic rendering. That's why an external screenshot of the window is needed.

In some cases, the layout calculation is correct, but there can be a graphic glitch (incorrect paint). In other cases, it's simply easier to write a reftest and rely on screenshots to check the layout (see how much longer your workaround is). Again, this is all about developer ergonomics.

guest271314 commented 5 years ago

@Hexcles Using CSS animation and/or Web Animation API you could check window.getComputedStyle() and/or .getBBoundingClientRect() during the animation and after the animation. Taking screenshots during the entire procedure. And/or simply capture the entire procedure using .getDisplayMedia(). There are several options available.

guest271314 commented 5 years ago

@Hexcles Taking a single screenshot at an arbitrary point in time in the future does not appear to convey the entire procedure and actual results.

Hexcles commented 5 years ago

Using CSS animation and/or Web Animation API you could check window.getComputedStyle() and/or .getBBoundingClientRect() during the animation and after the animation.

These, again, do not necessarily reflect the graphics in case there are bugs in the rendering pipeline.

getDisplayMedia

This is not available on all the browsers WPT runs on.

Taking a single screenshot at an arbitrary point in time in the future does not appear to convey the entire procedure and actual results.

If you're re-stating the docs that I quoted in the original issue and expressing your dissatisfaction of it, I totally agree. The statement is too vague. And changing it to better reflect the implementation and reduce confusion is exactly what this issue is about...

guest271314 commented 5 years ago

@Hexcles

These, again, do not necessarily reflect the graphics in case there are bugs in the rendering pipeline.

How do you know that without trying? Find and fix those bugs as well?

This is not available on all the browsers WPT runs on.

Have not tried the wpt server. .getDisplayMedia() is available at the latest, ostensibly, FOSS browsers Chromium and Firefox (Nightly). There is a Chrome specific exploit being used as mentioned earlier above, therefore .getDisplayMedia() could be used for the tests at those two browsers, to capture the entire procedure, if necessary, capture and analyze each frame captured; and/or, draw the images at X framerate to an array, then (or in "real-time") analyze each image data. That is, if the requirement is to observe the entire procedure of the blackbox (the browser implementation).

When test, generally do so at file: protocol or at console at each browser. The browser itself is the blackbox.

If you're re-stating the docs that I quoted in the original issue and expressing your dissatisfaction of it, I totally agree. The statement is too vague. And changing it to better reflect the implementation and reduce confusion is exactly what this issue is about...

Do not entertain satisfaction or dissatisfaction with a document. It is just a document. Would only posit that if the requirement is to observe the implementation of the blackbox, then capture the entire procedure. How you implement that is your choice. Clarification of the requirement is what is most important from perspective here: What are you trying to observe by taking one or more screenshots, and how are the results measured; comparing to a specification; to other implementations; what are the expected results in T time and clearly defined layout how can you compare pixels (from observation Firefox renders video recorded at the browser "clearer" than Chromium does; how does an individual clearly describe "clearer"); what is the expected result?

guest271314 commented 5 years ago

@Hexcles Re

getDisplayMedia

This is not available on all the browsers WPT runs on.

Chromium, Firefox and Edge support the Screen Capture API as getDisplayMedia(). Have not tried Edge implementation. adapter.js https://webrtc.github.io/adapter/adapter-latest.js is available if necessary. Safari evidently supports the API behind a flag see https://bugs.webkit.org/show_bug.cgi?id=186294, https://bugs.webkit.org/show_bug.cgi?id=181333. There is also html2canvas https://html2canvas.hertzen.com/features/ which could be used. See also https://jakearchibald.com/2015/tasks-microtasks-queues-and-schedules/, https://jakearchibald.com/2013/solving-rendering-perf-puzzles/; note the timeline recorder feature used, which appears to be what the current test is trying to emulate with a single screenshot, where that functionality could be extracted from the source code and implemented as a wpt module to be used for these tests.

guest271314 commented 5 years ago

@Hexcles Some code which could be adapted/adjusted to perform the test https://github.com/guest271314/MediaFragmentRecorder/blob/getdisplaymedia-webaudio/MediaFragmentRecorder.html. Chromium/Chrome does not include cues in the resulting webm video file, though ts-ebml https://github.com/legokichi/ts-ebml can be used to insert the timestamps for the ability to get an approximation of when the painting actually occurs.

Yes, the current documentation should be clarified.

web-platform-tests / wpt

Clarify reftest screenshot timing in docs (e.g. the double rAF) #16123