matrix-org / waterfall

A cascading stream forwarding unit for scalable, distributed voice and video conferencing over Matrix
Apache License 2.0
98 stars 5 forks source link

Cancelling screen-sharing doesn't propagate to the receiver #98

Open SimonBrandner opened 1 year ago

SimonBrandner commented 1 year ago
  1. A starts screen-sharing
  2. B sees screen-sharing
  3. A stops screen-sharing
  4. A performs a successful re-negotiation with the SFU
  5. B sees no changes - no messages from the SFU
SimonBrandner commented 1 year ago

Somehow I don't see any remote track closed in the logs

SimonBrandner commented 1 year ago

But I do see No RTP on subscription for...

SimonBrandner commented 1 year ago

It looks like the negotiating is going completely awry here... We send an offer with the transceiver being recvonly and the SFU also responds with recvonly (i.e. sendonly from our POV) and it somehow gets accepted and so the SFU still thinks it will be able to receive new packets and so we get no remote track closed

daniel-abramov commented 1 year ago

Interesting! Can/should we do something regarding that on the SFU side? Currently, we don't munge the SDP in any way, we pass them directly to Pion.

SimonBrandner commented 1 year ago

Possibly we can play with the transceivers though experience tells me the negotiation code probably isn't completely wrong and it's us screwing up somehow

SimonBrandner commented 1 year ago

FWIW, the cause seems to be https://github.com/pion/webrtc/issues/2226

daniel-abramov commented 1 year ago

Interesting.

OT: It seems like the same issue is present in the webrtc-rs, the author of the GitHub issue is the one who also contributes/maintains webrtc-rs at the moment.

dbkr commented 1 year ago

Yeah, I think this is basically correct — as per that bug, pion basically just doesn't support tracks (or at least receivers) being paused and then reactivated. There is a further bug whereby the receiver is only stopped if the transceiver is set to inactive and not when it's set to recvonly (ie. sendonly from the SFU's PoV). In other words:

For now, I suggest we work around this by not re-using transceivers with the SFU.

An interesting question is what livekit does here.

daniel-abramov commented 1 year ago

For now, I suggest we work around this by not re-using transceivers with the SFU.

So that's basically making EC use sendonly transceiver as you described in the second point, right? (i.e. each time we start the screen sharing again, that would result in a new onTrack etc, which is not a problem per se despite being a bit inefficient).

dbkr commented 1 year ago

Not really - using sendonly vs sendrecv makes pion close the track, so fixes the screenshare getting stuck. Re-using transceivers is then necessary because pion can't restart the track once it's closed it. But yes, the effect is a new onTrakc every time screen sharing starts.

dbkr commented 1 year ago

Worked around client side in https://github.com/matrix-org/matrix-js-sdk/pull/3120. @EnricoSchw was looking at the causes of this in pion.

EnricoSchw commented 1 year ago

The main Issue that the track IDs can change or assigned to another transceiver we will solve with an custom media identifier. The resolution of this ticket depends on these adjustments https://github.com/vector-im/element-call/pull/943