await ctx.commit() is an illusion

In https://github.com/junov/OffscreenCanvasAnimation/commit/5f969ddcee610908c01694ed6f1b1ec717afaa78#diff-fe83e208ade4a4d8de4dc610010e1271R91, there is an example to illustrate how to set up a synchronously blocking main loop:

"Another possibility is to use the async/await syntax:"

async function animationLoop() {
  var promise;
  do {
    //draw stuff
    (...)
    promise = ctx.commit()
    // do post commit work
    (...)
  } while (await promise);
}

However this does not actually do what readers expect - and especially - it does not solve what @OleksandrChekhovskyi was asking for in https://discourse.wicg.io/t/offscreencanvas-animations-in-workers/1989/15. That is, the comment

"Okay, I think we've got it. Basically, the syntax for getting a commit that throttles by blocking (as Oleksadr suggests) would simply be "await context.commit();". And of course, this is much better than hard-blocking the entire thread."

unfortunately does not solve the use case for a synchronous main loop.

The subtle issue is that an async function is a function that returns a Promise, and calling await immediately returns from the calling function (chain of async functions) and continues execution from the first non-async function. This effect breaks the computation model that is required for WebAssembly/Emscripten-based applications that set up their own main loops.

Here is a more concrete example based on OffscreenCanvas prototypes, which attempts to set up a blocking main loop, but fails due to the subtlety:

webgl_modal_loop.html:

<html><body><canvas id='canvas'>
<script>
var htmlCanvas = document.getElementById("canvas");
var offscreen = htmlCanvas.transferControlToOffscreen();
var worker = new Worker("webgl_worker.js"); 
worker.postMessage({canvas: offscreen}, [offscreen]);
</script>
</body></html>

webgl_worker.js:


/*
// This is a simulation of a blocking GL.commit(): (can try as alternative to real GL.commit() if OffscreenCanvas not yet implemented)
function commit() {
  var displayRefreshRate = 1; // Simulate a 1Hz display for easy observing
  return new Promise(function (resolve, reject) {
    setTimeout(function() {
      resolve();
    }, 1000/displayRefreshRate);
  });
}
*/

var gl = null;

async function renderLoop() {
  var frameNumber = 0;
  // Start our modal rendering loop
  for(;;) {
    gl.clearColor(performance.now(), performance.now(), performance.now(), performance.now());
    gl.clear(gl.COLOR_BUFFER_BIT);
    await gl.commit();
//    await commit(); // Alternatively to try out simulated gl.commit() in the absence of real thing

    console.log('rendered frame ' + frameNumber++);
    if (frameNumber > 10) break; // Stop test after 10 frames to not flood the browser
  }
}

function init(evt) {
  console.log('init');
  // Startup initialization for the application
  var canvas = evt.data.canvas;
  gl = canvas.getContext("webgl");
}

function runGame() {
  console.log('runGame');
  renderLoop();
}

function deinit() {
  console.log('deinit');
  gl = null; // tear down
}

onmessage = function(evt) {
  // Game "main": init, run loop, and deinit
  init(evt);
  runGame();
  deinit();
};

The expectation from a synchronously blocking execution is that the above application should print out

init
runGame
rendered frame 0
rendered frame 1
rendered frame 2
rendered frame 3
rendered frame 4
rendered frame 5
rendered frame 6
rendered frame 7
rendered frame 8
rendered frame 9
rendered frame 10
deinit

but instead, running the page will print out

init
runGame
deinit
rendered frame 0
<throw since gl is null>

This is because the onmessage function will continue executing immediately after the the await gl.commit(); is called, and deinit() will be called, breaking down the synchronous programming model.

I am currently implementing OffscreenCanvas support to Emscripten, and trying to figure out how to implement vsync synchronization when a Worker is rendering via OffscreenCanvas. In the absence of a GL.commit(blockUntilVsyncFinished=true) type of API or similar help from the OffscreenCanvas spec, my current thinking is to set up a requestAnimationFrame loop in the main browser thread, and use that to ping "vsync finished" events to a Worker, via SharedArrayBuffer. This will work if OffscreenCanvas based rendering is guaranteed to still allow observing "proper" requestAnimationFrame synchronization on the main browser thread, though I am not sure if OffscreenCanvas currently says anything about this?

Interesting problem... I think it would not be unreasonable to have a synchronous option (e.g. {bockUntilVsync: true}) for commit if it solves a problem that is otherwise hard to solve.

In your example, would it not be possible to just call deinit() from renderLoop()?

    if (frameNumber > 10) {
        deinit();
        break;
    }

In your example, would it not be possible to just call deinit() from renderLoop()?

Yeah, in this small example, it would be easy to fix like that. In general the portability problem comes from existing codebases where the main loop can be deep in the callstack, so such transformation would need changing all parent functions leading to the main loop. Essentially the change that will be needed to make await play nice with sync blocking loops reduces down to the same problem as with general sync->async transformation strategies in codebases. Using await keyword in conjunction with offscreenCanvas.ctx.commit(); can have other good uses, though for implementing a synchronously blocking loop it's not feasible.

Interesting problem... I think it would not be unreasonable to have a synchronous option (e.g. {blockUntilVsync: true}) for commit if it solves a problem that is otherwise hard to solve.

The pinging of rAF events from main browser thread to web worker will work to some degree, but it will mean that the worker will then be real-time dependent on the main browser thread, which would mean that if main thread is doing a relatively long operation (e.g. GC, DOM relayout, a 100msec IndexedDB store), then the Web Worker would stall from rendering. One could work around by adding some kind of max timeout of estimated/continuously benchmarked vsync rate, and if more msecs than that have elapsed in the Worker without having heard from the main thread, then assume we must have a new vsync time ready to present.

That kind of machinery feels not much better than a fixed sleep(1000/60), so the question practically boils down to whether we can do something on the OffscreenCanvas spec side that would be more accurately tied to waiting for vsync and give better guarantees than a sleep(1000/60).

Practically the good stuff that is desired is the same as with https://msdn.microsoft.com/en-us/library/windows/desktop/bb174576(v=vs.85).aspx and https://msdn.microsoft.com/en-us/library/windows/desktop/bb509554(v=vs.85).aspx. When displaying content in windowed mode where it is composited with respect to other web page content, it is probably impossible to let OffscreenCanvas dictate how often to present, but it would be good for the .commit() API to be flexible, since perhaps when paired with canvas.requestFullscreen() scenarios, the browser could imaginably bypass all regular browser compositing and do special fast tracks, and things like DXGI_PRESENT_DO_NOT_WAIT, DXGI_PRESENT_RESTART might be possible. Lowest possible latency for VR, and G-Sync and FreeSync are interesting considerations in fullscreen mode, so perhaps accepting a parameters object for .commit({...}) could be a future-proof way to expand to supporting those?

The pinging of rAF events from main browser thread to web worker will work to some degree, but it will mean that the worker will then be real-time dependent on the main browser thread, which would mean that if main thread is doing a relatively long operation (e.g. GC, DOM relayout, a 100msec IndexedDB store), then the Web Worker would stall from rendering.

Yeah... I've experimented quite a bit with propagating main thread rAF to the worker. Another problem you run into is if the Worker is unable to hit the target frame rate, then you start accumulating a backlog of rAF messages. Then you need to mitigate that by making the Worker send Ack signals back to the main thread.

so the question practically boils down to whether we can do something on the OffscreenCanvas spec side that would be more accurately tied to waiting for vsync and give better guarantees than a sleep(1000/60).

I'm not sure I understand the problem. The resolution of the promise returned by commit() is tied to vsync and is independent of the rate at which the the main thread refreshes the page. The way we've implemented this in Chrome is basically equivalent to triple buffering. The first time commit() is called, it resolves its promise immediately, in order to allow the accumulation of a two frame backlog in the presentation pipeline. This gives us really good smoothness, at the expense of latency.

perhaps accepting a parameters object for .commit({...}) could be a future-proof way to expand to supporting those?

Perhaps. But I think switching in and out of a low latency code path on the fly might be tricky on some OSes, so it might be better to control this via a context creation parameter rather than a commit parameter. FWIW, supporting low-latency rendering mode is a high priority feature on our side.

While we're on the topic of adding options to control commit behavior, another request we've gotten from developers is to create the ability to cut the frame rate by a constant factor (e.g. 30fps or 20fps). Motivation: games/animations that cannot hit 60fps look smoother if presented at a lower but steady frame rate. Right now some devs are implementing this by using rAF + monitoring time since last rAF in order to determine whether they should skip the current frame. That method does not work perfectly because rAF is not always called exactly on vsync, especially if the thread was busy, and it usually assumes that the display hardware runs at 60fps, which is becoming a flaky assumption with new generations of displays. So it would be nice to offer some kind of option to ask the browser to decimate the frame rate

While we're on the topic of adding options to control commit behavior, another request we've gotten from developers is to create the ability to cut the frame rate by a constant factor (e.g. 30fps or 20fps). Motivation: games/animations that cannot hit 60fps look smoother if presented at a lower but steady frame rate. Right now some devs are implementing this by using rAF + monitoring time since last rAF in order to determine whether they should skip the current frame.

This is a familiar conversation to Emscripten support requests as well. For reference (to self mostly), here's some sources to native world:

Native Windows OpenGL:

WGL_EXT_swap_control is a WGL extension that specifies swapping behavior, with functions wglSwapIntervalEXT and wglGetSwapIntervalEXT.
SwapBuffers is a core Win32 GDI function that swaps according to presentation interval set by wglSwapIntervalEXT.

Native Linux OpenGL:

GLX_EXT_swap_control is a GLX extension that specifies swapping interval, with functions glXSwapIntervalEXT.
glXSwapBuffers is a core GLX function that swaps according to presentation interval set by glXSwapIntervalEXT.

Native Apple OS X/macOS OpenGL:

In CGL, there exists CGLSetParameter() that can be called with kCGLCPSwapInterval, though my understanding is that CGL is becoming a bit more rare to use directly on OS X these days. Higher level abstraction APIs like Cocoa/AppKit are more common but seem to also expose the functionality for setting swap interval, e.g. Cocoa/Appkit has NSOpenGLCPSwapInterval. Also it still seem to build on top of CGL and expose the CGL context, so CGL can be utilized.
To present, one uses the oddly named NSOpenGLContext::flushBuffer

(Android) EGL:

eglSwapInterval to choose presentation interval
eglSwapBuffers to present

Direct3D 9.0c:

No separate funtions for setting presentation interval and presenting, but the same function IDirect3DDevice9::Present is used to both present with the specified number of intervals to wait: D3DPRESENT_INTERVAL_IMMEDIATE/ONE/TWO/THREE/FOUR

Direct3D 11:

Similar to D3D 9, no separate functions for presenting, but IDXGISwapChain::Present is used, where instead of an enum from 0-4, an integer is taken, although it is documented that only integer in the range 0-4 work.

Direct3D 12:

Shares same IDXGISwapChain API with Direct3D 11 and presents in the same way.

Swap intervals to decimate presenting to every second, third, fourth or so vsyncs are an abstraction that is prevalent on all APIs and OSes. Historically, that is what motivated me to add identical API to Emscripten as well, where we have

emscripten_set_main_loop_timing that allows setting a decimated swap interval, e.g. emscripten_set_main_loop_timing(EM_TIMING_RAF, 2); calls user specified render loop on every second requestAnimationFrame(), filtering out every other rAF call.
To set vsyncless rendering, emscripten_set_main_loop_timing(EM_TIMING_SETIMMEDIATE, 0); can be called. This renders using the setImmediate API if available, or by postMessage()ing to self if not.

The demo page http://clbri.com/dump/10kCubes_wasm_webgl2/10kCubes.html can be used to test out emscripten_set_main_loop_timing in practice. That function naturally only works with asynchronous callback-based rendering.

And now with OffscreenCanvas, we are adding that emscripten_webgl_commit_frame() function to allow calling OffscreenCanvasBasedGLCtx.commit().

The above is kind of good, and allows some kind of way to decimate to every second rAF or similar on main thread async based rendering loops. However, it falls short in one dramatic fashion: there is no API to ask what the current rAF() firing rate is, so if you decimate by a factor of two to target 30Hz, you can't do that just by a statically made choice of emscripten_set_main_loop_timing(EM_TIMING_RAF, 2);, since the user might be on a 120Hz/144Hz display, where that method would get you 60Hz/72Hz.

It is not currently possible to satisfactorily benchmark-estimate what the current refresh rate is on the web. The refresh rate is not static, i.e. if one moves a browser window between two displays that have different refresh rates, then the rAF() refresh rate should be allowed to change on the fly, and also VR displays. As a result, one has to keep benchmarking the rAF() rate on the go, but when one does benchmark, if one also renders the game screen while benchmarking, that might skew the obtained result (e.g. game renders too much while benchmarking, leading to bad estimate). That becomes an exploration vs exploitation style of problem, where one should periodically allocate some time to explore the rAF rate, but then also maximize the time to exploit the detected rAF rate (produce the game experience to user).

In native APIs, there are ways to both a) query the different supported refresh rates, and b) set the desired refresh rate of the display mode when moving to fullscreen on target display (or fail if not supported). In native world, that guarantees c) one knows what the refresh rate is at any time. Options (a) and (b) might be too convoluted to start doing for OffscreenCanvas here, but I propose that we would now add (c) while doing OffscreenCanvas. Pseudo-speccing,

read-only properties float Canvas.refreshRate and float OffscreenCanvas.refreshRate
specify what the current native animation rate of the Canvas element is.
This value does not need to stay static during the lifetime of the Canvas/OffscreenCanvas, but can change at runtime.

Then, to accommodate for synchronous commits, we would have


int GLContextBasedOnOffscreenCanvas.commitInterval;

  - Specifies the vsync swap delay that calls to GLContextBasedOnOffscreenCanvas.commit(); perform.
    0: immediate/no swapping, 1: wait until vsync,
    2: wait for every even vsync, 3: wait for every third vsync, ....
    Default value: 0.

Finally, since blocking on the main thread is bad, and following the direction shown by SharedArrayBuffer specification with its Atomics.wait() API that disallows waiting on the main thread ([1], [2]), I propose we would also add


If GLContextBasedOnOffscreenCanvas.commitInterval > 0
  and GLContextsBasedOnOffscreenCanvas.commit() is called on the main browser thread,
  a JavaScript exception "CannotBlockOnMainThread" is thrown.

I think this would solve the various synchronous swapping needs in Web Workers. Code will be able to

ask the browser what the update rate of the Canvas is, in both main thread and web workers,
call .commit() on the main thread to migrate to explicit swapping behavior on the main thread, as opposed to being restricted to the current implicit swapping behavior that WebGL rendering has (exit event loop -> autoswap)
run a main loop in a worker and specify vsyncless rendering by setting GLContextBasedOnOffscreenCanvas.commitInterval = 0; and call .commit()
run a main loop in a worker and specify vsync-based rendering by setting GLContextBasedOnOffscreenCanvas.commitInterval = 1; and call .commit()
observe changes to display refresh rates when moving between displays or when moving to present to fullscreen by querying Canvas/OffscreenCanvas.refreshRate.
explicitly target a 30Hz or 60Hz rendering rate independent of whether native refresh rate is 60Hz or 120Hz, by computing proper swap interval based on Canvas/OffscreenCanvas.refreshRate.

I'm not sure I understand the problem. The resolution of the promise returned by commit() is tied to vsync and is independent of the rate at which the the main thread refreshes the page. The way we've implemented this in Chrome is basically equivalent to triple buffering. The first time commit() is called, it resolves its promise immediately, in order to allow the accumulation of a two frame backlog in the presentation pipeline. This gives us really good smoothness, at the expense of latency.

I'm not sure if there is a problem even, I am just not too familiar with all the relevant browser internals. Ideally, we'd have that Web Worker code that calls GLContextBasedOnOffscreenCanvas.commitInterval = 0/1; and .commit() to be easily able to avoid any type of timing processing in the middle, but go straight to the GPU API present calls and then back to the user. In Web Workers that sounds like a possibility. On Windows, if one calls Win32 API Sleep(0); between two rendered frames, it is already a game lost on performance guarantees, which is why I'm kind of poking at the direction that OffscreenCanvas+requestFullscreen+No CSS/DOM tricks+.commit() in Web Worker would directly have facilities to tie to real presentation interval. A Web Worker synchronously blocking on a .commit() for vsync should be fine, since if one wants to process something parallel to that, one can always use another Worker.

Do you think any of this makes sense? :)

junov / OffscreenCanvasAnimation

await ctx.commit() is an illusion #1