whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.02k stars 2.62k forks source link

Ordering for Worker termination and Atomics.waitAsync promises #8276

Open syg opened 2 years ago

syg commented 2 years ago

As discovered by @iainireland in https://bugs.chromium.org/p/v8/issues/detail?id=13258

Background

Atomics.waitAsync is a stage 3 TC39 proposal which adds an async, Promise-returning version of Atomics.wait. Atomics.waitAsync, like Atomics.wait, is basically the Linux futex API, and works off of indices into a SAB. The interesting thing about Atomics.waitAsync is that multiple Workers may be waiting on the same index, so an Atomics.notify can resolve Promises from other agents.

Surprising behavior

Suppose there are two workers, w1 and w2. Each worker has an unresolved Atomics.waitAsync Promise. Atomics.notify resolves Promises in FIFO order, and will resolve w1's Promise then w2's Promise. After a worker terminates, all its unresolved Promises are removed from the various queues tracked by Atomics.waitAsync. The main thread then executes the following code:

L1: w1.terminate();
L2: Atomics.notify(sab, idx, 1); // Notify 1 waiter

There is a race here. Worker termination steps happen "in parallel", so two interleavings are possible:

  1. w1 terminates; its unresolved Promise is removed from the queue tracked by Atomics.waitAsync
  2. L2 notifies w2's Promise, since that's the only Promise left in the queue

OR

  1. L2 notifies w1's Promise
  2. w1 terminates before w1's Promise handler gets to run, making L2 look silently swallowed

In weak memory model lingo, L1 is program-order (i.e. thread-local order) before L2, and thus L1 happens-before L2. It is surprising that even though there is a clear ordering relationship here from the perspective of the main thread, L2 may still get swallowed due to an underlying race.

There's no ordering constraints around any of this stuff in either HTML or ecma262, so this is allowed right now. Should we do something here to try to enforce an ordering in spec?

(There are probably reasonable implementation fixes. One can imagine a fix where the agent calling Worker#terminate synchronously cleans up the Atomics.waitAsync Promise queues instead of doing it async, or something.)

domenic commented 2 years ago

I'm not quite sure I understand the observable consequences of the race. L2 is supposed to notify both workers and fulfill their promises, right? In the first race configuration:

In the second race configuration:

So as far as I can tell they are the same. I must be missing something?

iainireland commented 2 years ago

Atomics.notify has a count parameter indicating the number of waiters to notify. If you only notify one waiter, then w2's Promise isn't resolved unless w1 has already finished terminating.

syg commented 2 years ago

@iainireland is right, this is a problem with "notify one". It probably also bears to note that these were designed to implement condition variables and mutexes and so forth, and for condition variables, "notify one" and "notify all" are the common cases.

Edit: Clarified original post to pass 1 for count to Atomics.notify.

domenic commented 2 years ago

OK, I see. That does seem like a bug in the Atomics.notify spec then, and I agree it should be fixed. If there needs to be some additional work in HTML to fix it, I'd have no problem with that.