nextcloud / spreed

🗨️ Nextcloud Talk – chat, video & audio calls for Nextcloud
https://nextcloud.com/talk
GNU Affero General Public License v3.0
1.63k stars 435 forks source link

[SIP] Video not displayed if initially blank and only received after some time #13565

Open fancycode opened 5 days ago

fancycode commented 5 days ago

Noticed this while working with https://github.com/nextcloud/spreed/pull/13486 on the SIP bridge.

Currently the video unmute event is triggered if more than 2000 bytes are received within a second of the stream starting: https://github.com/nextcloud/spreed/blob/d63691af4228767d37249d4649f04049305e230f/src/utils/webrtc/webrtc.js#L1231-L1241

If less than 2000 bytes (but more than 0) are received, the video is assumed as "disabled" and further checks are disabled: https://github.com/nextcloud/spreed/blob/d63691af4228767d37249d4649f04049305e230f/src/utils/webrtc/webrtc.js#L1313

While this doesn't happen all the time, when it happens, only a datachannel unmute event will make the video appear in the callview: https://github.com/nextcloud/spreed/blob/d63691af4228767d37249d4649f04049305e230f/src/utils/webrtc/webrtc.js#L1575

This is a problem for SIP video which might be empty / black at the beginning and only start after some time, and datachannels are not supported by the SIP bridge.

Adding here for tracking. Maybe @danxuliu has an idea if there is a simple way to support this case without having to implement datachannels in the SIP bridge.

danxuliu commented 5 days ago

Actually that detection code was added back in the day to work around some problems sending the unmute events, but my idea was always to get rid of it :-)

Although in the beginning audio/video enable/disabled was notified through data channels now the clients do that also through signaling messages (well, right now the Android client does not, but it will do soon ;-) ). Would the SIP bridge be able to send a signaling message when audio/video is enabled/disabled? Then it should™ work independently of the data channels and that detection code.

Only thing to keep in mind is that the signaling messages are different than the data channel messages. Data channel messages are audioOn/audioOff/videoOn/videoOff, while signaling messages are type mute/unmute with payload { name: 'audio/video' }.

fancycode commented 5 days ago

We could use the incall flags, they are already set correctly for the SIP users: https://github.com/nextcloud/spreed/blob/0a0ced140c4fb2ce2b9ca6b273b7120d9c6e8b3d/src/constants.js#L132-L138

The internal client has IN_CALL and WITH_AUDIO, the (virtual) phone users have IN_CALL and WITH_PHONE. It would be easy to switch the internal client to also set WITH_VIDEO.

I didn't check if the flags are correctly set for web- / mobile clients already, but they should: https://github.com/nextcloud/spreed/blob/0a0ced140c4fb2ce2b9ca6b273b7120d9c6e8b3d/lib/Controller/CallController.php#L200-L203

fancycode commented 5 days ago

Although in the beginning audio/video enable/disabled was notified through data channels now the clients do that also through signaling messages [...]

So the clients have to use three different ways to notify about their state: the "incall" flags, the datachannel events and the signaling events 🙈

danxuliu commented 3 days ago

We could use the incall flags

The WITH_AUDIO and WITH_VIDEO flags say whether the participant is (or is expected to be) publishing audio/video, but not whether the audio/video is enabled or disabled. But if needed I guess we could do a special handling in the clients to assume that video is always enabled for the internal client of the SIP bridge if the flag is set. It would be necessary to be able to differentiate the internal client of the SIP bridge from other internal clients, though, which I do not know if it is currently possible.

So the clients have to use three different ways to notify about their state: the "incall" flags, the datachannel events and the signaling events 🙈

I expect to finally remove sending the audio/video state through data channels for the next Talk major release (it requires the Android app to both send and receive the state with signaling messages, but that should be ready soon). Actually once the Android app supports that it would not be needed to send data channels either in previous versions (at least, until the WebUI started to send and receive both data channel and signaling messages for the state), because mobile apps are backwards compatible and they are expected to be used in their latest major version even against older servers. But well, we will not mess with that and data channels will be kept also in older versions.

So technically the WebUI and the mobile clients will only need to use signaling messages to notify about their state (if we talk about audio/video enabled/disabled and ignore the speaking state, which is sent only through data channels... 🙈).

Having said that... at some point the state may be set using transient states instead of signaling messages :-P But that should be a full replacement rather than adding yet another way of sending the state ;-)