Open tonyherre opened 7 months ago
onrtpreceived
has readReceivedRtp
and onpacketizedrtpavailable
has readPacketizedRtp
(so you don't have to handle the event), but onrtpsent
currently does require per-packet processing, and onrtpacksreceived
requires per-feedback-message processing. So that's where we would need attention, moreso on onrtpsent
.
Feedback messages are at least somewhat aggregated, sent every 50-250ms in libwebrtc, recommended per frame or less often in the ietf draft.
onrtpsent
is my bigger concern, indeed. Is it enough to just get the actual sent timestamps in the RtpAcks
message somewhere and not be notified on every send? Depends on whether it's the UA deciding that packets are lost and notifying up (akin to the TransportPacketsFeedback struct in libwebrtc passed to the googcc impl) or the JS making its own decision based on being notified of a send but not of a remote feedback ack.
It would be good to do some measurements to validate we are not over optimising things at the expense of API ease of use. For instance, it is possible for the UA to batch the rtp sent events and do a single hop to the worker thread for multiple rtpsent events. That might not fix the JS event management overhead but maybe this overhead is not a blocker.
I definitely agree some more numbers here to get a better sense of the sweetspot of the tradeoff. A previous design discussion came to the broad consensus that something like >10k Hz of events is probably too much, <100 Hz probably fine, but largely based on gut feeling iirc. The best evidence we've found for overheads in Chromium have been looking at trace recordings and pprofs, eg a single Audio Encoded Frame transform, so running at 50 Hz, was consuming increasing the CPU load of a Meet page by 0.2% just to get the audio data from webrtc wrapped in an ArrayBuffer and into a JS event (internal ref: http://shortn/_ZzJdD2pcQS). I'll try to get something less anecdotal that can be shared publicly indetail. Assuming this is representative though, and scales ~linearly, launches needing to get towards the kHz range to be useful would start being blocked as major CPU regressions.
This did at least result in some V8 optimizations - eg crbug.com/40287747 - but there's still plenty of cost at high frequency.
This issue was mentioned in WebRTC Interim, May 21st – 21 May 2024 (RtpTransport (Peter Thatcher))
Note from discussion: Would be good to have "batch read" methods for onrtpsent and onrtpacksreceived, like readSentRtp and readAckedRtp.
Slides I had at the Discussion Group earlier this week pertaining to this issue: https://docs.google.com/presentation/d/1bIuSiUsAiYsokxfBbxVZiaQwiFqO7D2e6my7dGacHEo/edit#slide=id.p
TL;DR: There are 3 sources of CPU/Power consumption cost due to an API with high frequency events:
I'll prep a PR adding batch interfaces where an app would currently be forced to install Event listeners to get all of the required info.
The alternative to an event that expose multiple packets is a ReadableStream of packets, which is more or less what WebTransport chose. A packet per micro task is probably cheaper than a packet per event loop task. Why not making the same choice here?
Also, supporting multiple packet processors/listeners is probably not a requirement, which makes events less interesting.
We talking before in the Design Discussion meetings about the need for reducing the CPU & power overhead of using RTPTransport APIs. One large source we've seen in Chrome when working with Encoded Frame APIs is that frequent (100s of Hz) JS event scheduling brings a significant amount of overhead due to context switching, thread wakeups, JS event management etc. eg see crbug.com/40942405.
To avoid these issues at the packet level, which is inherently higher frequency than frames, we would need to avoid requiring apps to have JS callbacks execute individually per-packet, for both send, receive and BWE, in order to achieve their usecases.
We've already talked about this a few times in the design discussion meetings, and come to ideas about having APIs to read batches of all available packets etc. The same needs to be considered for BWE feedback signals & mechanisms.