rAF and microtasks (discussion thread)

kainino0x commented 4 years ago

Opening this as a new issue because I don't want to derail #506. cc @magcius

Originally posted by @litherum in https://github.com/gpuweb/gpuweb/pull/506#issuecomment-590640169

So there is a microtask queue drain between the rAF() callback and the present.

Last I checked, Chromium actually implements this the other way (i.e. incorrectly; I'm assuming your reading of the spec is correct). It does not drain the microtask queue between the rAF callback and the present.

I was deep in an email conversation on this topic several months ago with @AshleyScirra, which I irresponsibly dropped the ball on and haven't replied (sorry, Ash!). Ash, evidently per spec (sorry I didn't check before), your desired behavior is actually the correct one, and Chromium is buggy.

That said... I'm actually very skeptical that the current spec text is how it ought to be. Take some rAF callback like this:

async function frame() {
  requestAnimationFrame(frame);
  drawScene();
  await updateHUDState();
  drawHUD();
}
requestAnimationFrame(frame);

If you take the spec way, then whether your HUD renders in the correct frame or not depends on whether the promise resolves as a task or as a microtask. While this behavior is very much present and observable in JS today, it almost never makes a difference in correctness except in quite arcane cases, which allows the majority of developers to never think about tasks vs microtasks.
If you take the Chromium way, then your HUD never renders. This can be annoying to code around, but at least it means you never flakily present in the middle of drawing - the behavior is consistent and you know you have to fix it.

kdashg commented 4 years ago

This should be in whatwg/html, I think, as I believe this is true in general. (not just for webgpu)

kainino0x commented 4 years ago

Definitely. This issue just exists as a discussion thread, but if we end up deciding a change is needed, then we can take it to html.

magcius commented 4 years ago

I agree that Chrome's interpretation seems more immediately useful for me, and honestly matches my expectations of what a sane application framework would do.

AshleyScirra commented 4 years ago

A better example is this:

async function frame() {
  requestAnimationFrame(frame);
  drawScene();
  await Promise.resolve();  // no-op await
  drawHUD();
}
requestAnimationFrame(frame);

The main point for me is that if you await a resolved promise in rAF, it should still present the results of drawHUD(). As it stands Chrome will return from frame() as soon as it hits the await and miss presenting the results of drawHUD() even though nothing async really happened. Draining microtasks would fix this as it will then reach the end of drawHUD() before presenting.

As it stands, in Chrome you must not use any await at any point anywhere in the entire call tree made from a rAF callback - potentially covering a large amount of complex code in a real framework - even if nothing async really happens, otherwise you miss presentation. This is quite a serious limitation since awaiting to pause the render loop is actually a useful feature.

I think it's reasonable to say that all rendering in the rAF callback must finish within the same task, rather than microtask. The important thing is not whether the thing you await resolves as a task or microtask; the important thing is that awaiting resolved promises does not cause it to miss presentation.

magcius commented 4 years ago

Actually, for Kai's initial case, isn't it actually worse than that? Let's assume that the promise is guaranteed to resolve between the current frame and the next frame. Since requestAnimationFrame was called before waiting on updateHUDState(), this would mean that drawHUD would be called twice the next frame. I assume the order is defined, but I'm not enough of a spec guy to know the details.

This is quite a serious limitation since awaiting to pause the render loop is actually a useful feature.

I'm curious how you would use this in practice. Because you don't have any application-level control over microtask ordering (you can't set priorities of things you await, basically), and you can't guarantee or even validate that an arbitrary Promise will complete in a microtask rather than a task, it feels a bit like playing with fire to me. Using await to pause the render loop in an attempt to switch to some other work feels weird, because you don't know what work you're switching to or when the control would come back to requestAnimationFrame.

AshleyScirra commented 4 years ago

Here's an example of what I mean:

async function frame()
{
    drawScene();
    await waitForPauseMenuToResume();
    requestAnimationFrame(frame);
}

Normally when the game is running waitForPauseMenuToResume will just exit and return a resolved promise. Then when you pause the game, it will pause the render loop and only once unpaused will it proceed and call rAF() again.

However as it stands in Chrome, with this code running normally and not paused, it misses presentation every frame solely because it ran an await. Presenting at the task level solves this.

kainino0x commented 4 years ago

Actually, for Kai's initial case, isn't it actually worse than that? Let's assume that the promise is guaranteed to resolve between the current frame and the next frame. Since requestAnimationFrame was called before waiting on updateHUDState(), this would mean that drawHUD would be called twice the next frame. I assume the order is defined, but I'm not enough of a spec guy to know the details.

That's correct. If it was already resolved, then the microtask occurs right after the frame is presented, so it ends up before the next rAF callback. If it was not, then it won't happen until an arbitrary point in an arbitrary later frame (as a microtask of whatever task resolves the promise). That's still always between one frame's present and the next frame's rAF callback (the order never swaps).

kainino0x commented 4 years ago

Normally when the game is running waitForPauseMenuToResume will just exit and return a resolved promise. Then when you pause the game, it will pause the render loop and only once unpaused will it proceed and call rAF() again.

I think this case is already OK. The timeline should be (in chrome):

animation frame
- rAF-callback
- present
- microtasks:
- rest of the code in frame: requestAnimationFrame(frame)
any async work
animation frame
- rAF-callback (because requestAnimationFrame has definitely already been called)
- ...

AshleyScirra commented 4 years ago

OK, but then this ordering is broken?

async function frame()
{
    await waitForPauseMenuToResume();
    drawScene();
    requestAnimationFrame(frame);
}

It seems like a nasty gotcha.

kainino0x commented 4 years ago

Yes.

magcius commented 4 years ago

Assuming I'm following the large amount of unspoked-but-assumed code correctly ( ;) ), if the game was paused, then your pause menu would never be drawn, even if Chrome followed the existing spec.

It's hard because I'm sure you have more sensical use cases here and collapsing that down into a shorter snippet to communicate what you mean is tricky.

kainino0x commented 4 years ago

I was guessing in that example the pause menu is rendered as DOM, not part of the canvas.

AshleyScirra commented 4 years ago

There are lots of other reasons to await in a rAF callback: going to storage to load a savegame, asynchronously loading textures for the next scene, posting to the DOM to use an API not in a Worker when using OffscreenCanvas, and so on. In all these cases it's the intent that these operations suspend the render loop.

We ran in to exactly this issue with our engine in Chrome, and now we have a big pile of unintuitive code to make sure no await statements appear anywhere in the critical rAF call tree.

litherum commented 4 years ago

Some references to where this is spec'ed:

The HTML5 spec defines the order these things occur in. Step 10.11 is "run the animation frame callbacks." Within that, step 3.3 is "Invoke callback." Within that, step 11 is the actual javascript function call, and step 14.2 is "Clean up after running script." Within cleaning up after running script, step 3 is "perform a microtask checkpoint" which drains the microtask queue.

kdashg commented 4 years ago

Yep, it's clear from my read as well that Chrome is out of spec, so paths forward are one of:

Bring Chrome into compliance: https://bugs.chromium.org/p/chromium/issues
Chrome petitions to change the spec: https://github.com/whatwg/html/issues

Firefox implements the spec here. Interestingly, it seems to used to be the other way around: https://github.com/whatwg/html/issues/2637

For example, this is why it's sort of annoying to implement requestPostAnimationFrame as in https://hackmd.io/lvtOckAtSrmIpZAwgtXptw#Use-requestPostAnimationFrame-not-requestAnimationFrame , rather than just enqueuing a microtask.

magcius commented 4 years ago

Interestingly, it seems to used to be the other way around: whatwg/html#2637

Interesting. Is there any discussion as to why this changed?

kainino0x commented 4 years ago

Why does it make the requestPostAnimationFrame polyfill more difficult? setTimeout(task) should enqueue a non-micro task (it will be after any already-enqueued tasks).

Would Chrome's behavior make it easier to get the timing 'right'* (so you can just Promise.resolve().then(task))?

kainino0x commented 4 years ago

Interestingly, it seems to used to be the other way around: whatwg/html#2637

I'm not sure, but I think this might be a little more nuanced - it's talking about the timing relative to relayout, not just present.

kdashg commented 4 years ago

setTimeout's nesting rules make it unpalatable, because you might run afoul of them without realizing.

Chrome's non-spec behavior would make this particular functionality easier, at the expense of needing the painful anti-async changes that @AshleyScirra. As the spec is today, you can make the choice yourself between always-cleaned-up microtasks, and deferred full (non-micro) tasks.

kainino0x commented 4 years ago

setTimeout's nesting rules make it unpalatable, because you might run afoul of them without realizing.

Shouldn't it be okay in your rPAF polyfill, because setTimeout is always called from rAF, and therefore is not nested?

kdashg commented 4 years ago

postMessage-as-setTimeout-zero has been an open secret for ten years: https://dbaron.org/log/20100309-faster-timeouts It's mentioned explicitly in the setTimeout docs on MDN: https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setTimeout#Reasons_for_delays_longer_than_specified

So yes, it's "probably fine" if indeed it's called from rAF, but I like reliable primitives. :)

kainino0x commented 4 years ago

I think that postMessage-as-setTimeout-zero needs to be fixed so it is throttled, and setTimeout-from-rAF needs to be well specced and implemented so it is not throttled.

magcius commented 4 years ago

There was also a proposal for window.setImmediate which had similar behavior to postMessage, but guaranteed zero... I guess rPAF is that under a new name.

Is there a microtask drain with this proposed rPAF, or with any of the polyfills? If people switched to rPAF, would code work or not?

I think it's confusing enough and overlaps with enough other work inside WHATWG that it's worth an issue in whatwg/html.

kainino0x commented 4 years ago

rPAF isn't a setImmediate, it's a callback that happens "right" after the rAF-callback, present sequence.

https://github.com/WICG/requestPostAnimationFrame/blob/master/explainer.md

kdashg commented 4 years ago

I think that postMessage-as-setTimeout-zero needs to be fixed so it is throttled, and setTimeout-from-rAF needs to be well specced and implemented so it is not throttled.

You're welcome to file bugs, but the state of the world sort of works for me. FWIW dbaron is Mozilla's TAG rep iirc, though it's no guarantee they stand by the setZeroTimeout thing this many years on.

AshleyScirra commented 4 years ago

This may be going off on a tangent somewhat, but if we have queueMicrotask, why not also add queueTask? It would avoid the need for having to use timers or postMessage.

kainino0x commented 4 years ago

Except for the throttling issue, my understanding is that is what setTimeout does. I think queueTask would need the same throttling anyway.

magcius commented 4 years ago

Also question: does any of this apply to XRSession.requestAnimationFrame? All of it?

kainino0x commented 4 years ago

I don't know how that rAF is specced, but we should definitely make sure it has the same ordering between presentation and microtasks.

AshleyScirra commented 4 years ago

We got queueMicrotask even though you can do Promise.resolve().then(...) - I think there's a good argument that it illustrates the intent better, hence it may well make sense to have queueTask as well.

magcius commented 4 years ago

Empirical evidence (i.e. me trying it just now) currently suggests that you cannot do the same rPAF trick with setTimeout on an XRSession.requestAnimationFrame. This probably needs to be brought up with whatwg/html then to ensure that we're consistent about timing.

@toji

gpuweb / gpuweb

rAF and microtasks (discussion thread) #596