selkies-project / selkies-gstreamer

Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop Streaming Platform for Self-Hosting, Containers, Kubernetes, or Cloud/HPC
Mozilla Public License 2.0
317 stars 44 forks source link

[META] Optimize the WebRTC stack to the maximum #157

Open ehfd opened 1 month ago

ehfd commented 1 month ago

Linked with #160, #153, #152, #39, #34, #30

Also read: https://github.com/m1k1o/neko/issues/371

In the v1.6.0 release, there is much higher confidence in our performance optimizations in the WebRTC stack. We have achieved a way to eliminate jitterbuffer latency from the WebRTC decoder using playout-delay and jitterBufferTarget, along with many other measures to stabilize and improve the video and input (DataChannel) stack.

Moreover, we have incorporated smaller frames for the Opus codec to see if the latency improves (tracked in #153), but NetEQ in Chrome mostly works on its own.

There are still multiple interventions that may bring this WebRTC stack to the maximum and achieve the most ideal and optimal performance possible.

Backend:

https://issues.chromium.org/issues/40198264

This is possible in WebRTC, where Nutanix Frame implemented YUV 4:4:4 within Chromium quite some time ago. First, however, color in YUV 4:2:0 (#160) should be solved first as there is no legitimate reason that color in YUV 4:2:0 should be over +/- 1 different from the original source.

https://multi.app/blog/making-illegible-slow-webrtc-screenshare-legible-and-fast https://multi.app/blog/measuring-shared-control-latency

Currently, the Opus queue is commented out. However, queues may have useful features. Along with re-investigating the effectiveness of queues in Opus and their roles in latency, queues in video RTP payloaders may (or may not) also help during congestion where certain latency spikes might stay for >5-15 seconds because the WebRTC decoder scrambles to decode very late frames instead of simply dropping them. An unknown configuration from the web browser may also totally eliminate this situation. This must work nicely with infinite keyframe/GOP configurations and NACK/PLI with RTX.

WebRTC:

Use a=group:BUNDLE 0 1 2 3 ... and a=mid:0, a=mid:1, ... to establish one SDP session, but with independent streams for Audio, Video, DataChannel (m=application x UDP/DTLS/SCTP webrtc-datachannel), Microphone, Webcam, and other types of streams which don't interfere nor do audio/video sync.

Such as:

v=0
o=- 2 IN IP4 1.1.1.1
t=0 0
a=group:BUNDLE 0 1 2 3
a=fingerprint:sha-256
a=setup:actpass
m=audio x UDP/TLS/RTP/SAVPF 111 63
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:0
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=sendonly
a=msid:id audio
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:63 red/48000/2
a=rtcp-fb:63 transport-cc
a=fmtp:63 111/111
a=ptime:10
m=video x UDP/TLS/RTP/SAVPF 96 97 101 102 98
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:1
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendonly
a=msid:id video
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:101 H264/90000
a=rtcp-fb:101 transport-cc
a=rtcp-fb:101 ccm fir
a=rtcp-fb:101 nack
a=rtcp-fb:101 nack pli
a=fmtp:101 level-asymmetry-allowed=1;packetization-mode=1;sps-pps-idr-in-keyframe=1;profile-level-id=42e01f
a=rtpmap:102 rtx/90000
a=fmtp:102 apt=101;rtx-time=125
m=application x UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=mid:2
a=sctp-port:5000
a=max-message-size:262144
m=audio x UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0

The main purpose of doing this is to still isolate different streams so that there is no audio/video sync at all (which adds inevitable latency) and at the same time improve the performance of DataChannels as well by maintaining an independent stream separate from the video, but handle all of them with one TURN relay port or other types of WebRTC port in one single SDP.

https://www.rtcbits.com/2023/05/webrtc-header-extensions.html

a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:4 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay

Note: http://www.webrtc.org/experiments/rtp-hdrext/color-space causes the Chrome WebRTC decoder to skip the Hardware Decoder and go straight to the Software FFmpeg decoder.

The above RTP Header Extensions are known to help with controlling latency and timing. These can be implemented in GStreamer so that it can be emitted into RTP payloaders.

https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3549 https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3550

https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextclientaudiolevel.html https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextmid.html

SDP support in web browsers: https://codepen.io/kwst/full/yLaaxRy

draft-holmer-rmcat-transport-wide-cc-extensions-01 is enabled for video and audio when rtpgccbwe is active. abs-send-time, video-timing are not available in GStreamer. playout-delay has been implemented in a very restricted temporary form in gstwebrtc_app.py, where the only zero values can be sent (which is what we need, anyways).

a=imageattr:96 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=imageattr:97 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=rtpmap:98 flexfec-03/90000
a=rtcp-fb:98 transport-cc
a=fmtp:98 repair-window=10000000
a=ssrc-group:FEC-FR
a=max-message-size:262144

https://github.com/nextcloud/spreed/issues/6739 https://groups.google.com/g/discuss-webrtc/c/u7k1_hASS4Q https://stackoverflow.com/questions/57653899/how-to-increase-the-bitrate-of-webrtc https://groups.google.com/g/discuss-webrtc/c/udyHHPnrQMo https://github.com/pion/webrtc/discussions/1827 https://ekobit.com/blog/diving-deeper-into-webrtc-advanced-options-and-possibilities/ https://chromium.googlesource.com/external/webrtc/+/a6b99448eec51527eca0bc59f6da71061d02e807/webrtc/media/base/mediaconstants.cc https://groups.google.com/g/discuss-webrtc/c/ORJdeoFAaBE https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/sdp-ext/fmtp-x-google-per-layer-pli.md

The above links may have irrelevant information (controlling sender bitrate, this is because webrtcbin is the sender and it does not use libwebrtc).

b=AS:300000
a=fmtp:96 sps-pps-idr-in-keyframe=1;x-google-max-bitrate=300000;x-google-min-bitrate=0;x-google-start-bitrate=12000

https://neko.m1k1o.net/#/getting-started/configuration?id=webrtc

Pion provides various WebRTC configurations and protocols including EPR, UDPMUX, TCPMUX, NAT1TO1, ICE-LITE, ICE-TCP, etc. These techniques allow more setup flexibility in addition to TURN/STUN and allow limiting port ranges or using a single port for many numbers of connections. This should be implemented with GStreamer's webrtcbin.

Frontend:

https://web.dev/articles/requestvideoframecallback-rvfc It seems that when the system is in a low power efficiency mode, video decoding is not done quickly, as in the example. This leads to perceived increased latency because the frames aren't getting painted as often as they should. Some settings in WebRTC or

Current configuration (reference from https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY/m/Usq84y0mAQAJ, a bit of a CPU hog but acceptable with async, could be more optimized or otherwise able to assess the effect of this configuration in web browsers):

// Repeatedly emit minimum latency target
webrtc.peerConnection.getReceivers().forEach((receiver) => {
    let intervalLoop = setInterval(async () => {
        if (receiver.track.readyState !== "live" || receiver.transport.state !== "connected") {
            clearInterval(intervalLoop);
            return;
        } else {
            receiver.jitterBufferTarget = receiver.jitterBufferDelayHint = receiver.playoutDelayHint = 0;
        }
    }, 15);
});

https://www.w3.org/2021/03/media-production-workshop/talks/slides/sergio-garcia-murillo-whip.pdf https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY https://henbos.github.io/webrtc-timing/ https://github.com/jakearchibald/web-platform-tests/blob/master/webrtc-extensions/RTCRtpReceiver-playoutDelayHint.html https://mediasoup.discourse.group/t/webrtc-playout-delay-extension/2067 https://issues.chromium.org/issues/324276557 https://bugzilla.mozilla.org/show_bug.cgi?id=1592988 https://groups.google.com/a/chromium.org/g/blink-dev/c/4W4orKqA3Rs https://www.reddit.com/r/WebRTC/comments/ipewaq/disable_use_of_jitter_buffer/?rdt=58693

ehfd commented 1 month ago

Outstanding issues with GStreamer (also see https://github.com/selkies-project/selkies-gstreamer/issues/34#issuecomment-2165267433):

Major: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1261

Minor: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1494 https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3482 https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3549

ehfd commented 1 week ago

The answer is all in: https://github.com/webrtc-sdk/libwebrtc

Someone's going to have to dive into this.