w3c / webrtc-pc

WebRTC 1.0 API
https://w3c.github.io/webrtc-pc/
Other
438 stars 115 forks source link

Can onicecandidate fire before SLD.then(code)? #2574

Closed henbos closed 4 years ago

henbos commented 4 years ago

ICE candidates of a pc can be generated in response to pc.setLocalDescription(answer).

Both processing the result of an SLD and processing ice candidates being generated and the event fires happens by queuing a task in response to the SLD operation. So one would assume, that SLD happens first and ice candidate events fires second.

So clearly code1 would execute before code2 in "pc.onicecandidate = e => { code2 }; pc.SLD.then(code1);"

OR WOULD IT?!

While p is resolved first and the event is fired second, p.then(code) only runs at the (start/end?) of the task execution cycle, whereas JS event handler are executed immediately.

In Chrome (see bug) I believe the promise is resolved first, then the onicecandidate event handler runs, then the promise's then()-code runs. Is this spec-compliant or not?

This causes issues in WPTs. Example:

// Setup
caller.onicecandidate = e => { callee.addIceCandidate(e.candidate) };
callee.onicecandidate = e => { caller.addIceCandidate(e.candidate) };

// ...

// In response to an offer from caller:
const answer = await callee.createAnswer();
await callee.setLocalDescription(answer);
await caller.setRemoteDescription(answer);

What happens:

  1. callee.SLD's promise gets resolved first, but the promise.then() code is not exected yet, i.e. we do not start to do caller.SRD just yet.
  2. callee.onicecandidate fires and caller.addIceCandidate happens, which doesn't work because we haven't performed caller.SRD yet.
  3. caller.SRD is executed because this is what happens after "await".

Question:

  1. Is Chrome spec-compliant or not?
  2. Is this a testing-only issue or could this cause races in real applications too (assuming slow processing time of SLD and 0 ms connection with a super-computer)?
henbos commented 4 years ago

Tagging @jan-ivar and @alvestrand to take a look

henbos commented 4 years ago

Regardless of the correct answer is, I think we need to clarify the order here, since we typically just says "when bla bla happens, queue a task to execute the following steps"

henbos commented 4 years ago

This lead to a strange-looking change in the WPT helper: https://chromium-review.googlesource.com/c/chromium/src/+/2404346

henbos commented 4 years ago

Firefox probably avoids this issue by performing "SLD" with a posted task, so the promise.then() code happens before the task to do onicecandidate

fippo commented 4 years ago

Is this a testing-only issue or could this cause races in real applications too (assuming slow processing time of SLD and 0 ms connection with a super-computer)?

This is largely a question about signalling protocols and when you signal the local description. You must not signal ice candidates before the initial offer, otherwise this will cause addIceCandidate before setRemoteDescription (assuming the signaling protocol gets that far even).

You can do either of two things here. First,

pc.createOffer()
  .then(offer => pc.setLocalDescription(offer))
  .then(() => /* signal offer */)
})

In that case, if onicecandidate fires before SLD resolves its bad as described above.

The other case is that you assume that setLocalDescription can not throw with the given offer:

pc.createOffer()
  .then(offer => {
      /* signal offer */
     return pc.setLocalDescription(offer);
  })
  .then(() => /* no op */)

Now we have largely removed the reasons for SLD erroring for reasons between createOffer and SLD. In practice they still exist however because munging.

alvestrand commented 4 years ago

I think this test:

pc.onicecandidate = assert_unreached; await pc.setLocalDescription() pc.onicecandidate = "we're OK, thank you"

should test the Right Thing.

alvestrand commented 4 years ago

Test written - https://chromium-review.googlesource.com/c/chromium/src/+/2423886

Comments welcome on whether there are trickeries with .then vs await that will cause different timings.

henbos commented 4 years ago

Comments welcome on whether there are trickeries with .then vs await that will cause different timings.

I strongly believe that "await p; code;" is equivalent to "p.then(code);" in terms of timing, but I don't know this for a fact.

jan-ivar commented 4 years ago

(back from PTO) no, icecandidate cannot fire before SLD, because we made it so.

This behavior is critical to ensure that signaling order, e.g. offer candidate candidate candidatenaturally translates into async methods on callee in the same (correct) order.

await p; foo(); is syntactic sugar for p.then(foo).

Getting this fixed in Chrome would be great, since it's led to apps and wpt implementing queuing of ice which is an anti-pattern.

henbos commented 4 years ago

@jan-ivar, the [[EarlyCandidates]] slot is only added to if candidates are generated while the descriptions and still pending and before the promise is resolved. The race in question is when candidates are fired AFTER the promise is resolved but BEFORE the promise's then() code is executed.

henbos commented 4 years ago

E.g. if you execute SLD truly in parallel and both "resolve SLD promise" and "fire event" happen in two PostTasks that happen in rapid succession

henbos commented 4 years ago

My belief is that "resolve promise p" does not synchronously execute p.then(). Rather the "then" will happen at the next task execution cycle, or something like that

jan-ivar commented 4 years ago

The race in question is when candidates are fired AFTER the promise is resolved but BEFORE the promise's then() code is executed.

@henbos The promise resolving and foo being invoked, while not synchronous, both happen on the same microtask queue, i.e. before the end of the current run of the JS event loop.

The surfacing of ice candidates happen on a queued task (i.e. main event loop).

Therefore they cannot race.

henbos commented 4 years ago

You may have a better understanding than me, but I don't think I follow. Both tasks are queued in the same maner:

If description is applied successfully, the user agent MUST queue a task that runs the following steps:

  • Do some cool stuff.
  • ...
  • Resolve p with undefined.

And:

When the ICE Agent indicates that a new ICE candidate is available for an RTCIceTransport, either by taking one from the ICE candidate pool or gathering it from scratch, the user agent MUST queue a task that runs the following steps:

  • Do some cool stuff.
  • ...
  • Bla bla surface the candidate, i.e. "Fire an event".

In both cases we do "queue a task that". So assuming we've queued a task to resolve p and we've queued a task to fire an event, when we do the "execute all tasks" part of the task queue we'll end up making p resolved and the event fired. I'm not sure it matters if then() is handled at the end of the current JS event loop or the start of the next one, as long as resolving p does not synchronously perform then() we have a problem?

Is there something I'm missing? Like is there a limit on the number of tasks that can be handled in a single JS task execution cycle? Or is there something say "if any promises have been resolved, stop executing further queued tasks currently pending"?

jan-ivar commented 4 years ago

Both "queue a task" statements above queue tasks on the main event loop.

Crucially, the main event loop's microtask queue is emptied at the end of each main task.

Let's say both "queue a task" you mention happen. The main event loop now contains [T1, T2].

  1. We execute T1 (SLD success):
    1. The "Resolve p with undefined" queues a microtask on the main event loop's microtask: [M1].
    2. We run off the end of JS, where we empty the microtask queue. This executes M1:
      1. This runs foo in the then
  2. We execute T2 (surface ICE candidate).

Check out In the Loop explaining how microtask queues block rendering.

henbos commented 4 years ago

Thank you! In that case, I believe the spec is working as intended, and if Chrome is operating the way I suspect that it is, then Chrome has a bug. Let's see if the bots are flaky or not with Harald's new test.

henbos commented 4 years ago

The spec is glorious. To be seen if Chrome is.