matrix-org / waterfall

A cascading stream forwarding unit for scalable, distributed voice and video conferencing over Matrix
Apache License 2.0
97 stars 5 forks source link

Investigate possible performance with Pion and compare it to LiveKit #23

Open SimonBrandner opened 2 years ago

SimonBrandner commented 2 years ago

Currently, the SFU using a little too much CPU - we should fix that

SimonBrandner commented 1 year ago

Further context: https://matrix.to/#/!MBOHuDMLuTeiWcxenB:matrix.org/$OQVMLthcVtLjFHB4H4nG_RnCigdXmZ7R8OH89UHV0d4?via=matrix.org&via=element.io&via=robin.town

daniel-abramov commented 1 year ago

UPD. So far the performance looks okayish, i.e. nothing particularly crazy in our code that would kill the performance, it seems like this performance is ok for Pion (after all, we're using a very weak machine and videos and screen-sharing of quite good quality and high FPS).

The only thing that we might want to do is to compare it to the competition, e.g. to LiveKit. If the performance is more or less identical, then we can't do much about it.

Since it's not a bug in our code, I've changed the tag from Defect to Enhancement.

Sean-Der commented 1 year ago

Hey @daniel-abramov i would love to help with this!

Pion does some things that aren’t as performant as can be. If you are interested would love your help to fix that :)

daniel-abramov commented 1 year ago

Ah, thanks @Sean-Der 🙂

I think It would be great to hear your expert opinion on the following: we're running an SFU on a very weak VM with a single 2.8 GHz core and 1.9 GB of RAM. Our SFU consumes about 30% CPU in a conference with 5 participants (each publishes video and audio) + 1 screen sharing track.

When the screen sharing track is removed, then the CPU is about 23%. WIthout the screen sharing, a conference of about 6-7 participants (all sharing video and audio) consumes about 30% of CPU.

I'm not really sure if it's actually that bad (provided we have a very weak machine and the quality of video and screen sharing, as well as the FPS, are all pretty good).

Since we’re not decoding anything, the only expensive thing that Pion seems to be doing in such cases is encryption/decryption, right? Do you think that such a usage with our use case could be considered ok? (I believe that if/once we reduce the quality of videos and introduce simulcast, the CPU usage might drop)

poVoq commented 12 months ago

Looking at the element-call repo it seems like they prefer using LiveKit these days?

What does this mean for the future of this very interesting cascading approach?

@daniel-abramov are you still willing to work on this? Maybe as a more general SFU that systems other than Matrix could use as well? Would be a shame if this idea ends up being totally abandoned.

daniel-abramov commented 12 months ago

What does this mean for the future of this very interesting cascading approach?

As to my knowledge, the cascading approach is not abandoned (just postponed a bit) and I believe that it will be implemented eventually (I think not in the nearest future though). Note that the latest versions of the waterfall and element-call that worked together did not really support proper cascading either. That being said, it's possible to have cascading with Element Call and LiveKit (albeit some work must be done in both Element Call and LiveKit to add support of it), so the choice of LiveKit does not really hold folks from implementing cascading in the future (not to mention that one can exchange the SFU for a different one should this be necessary).

@daniel-abramov are you still willing to work on this?

Not in the context of waterfall. My personal opinion is that it does not really make much sense to invest too much time into waterfall (or any other Go/Pion SFU for that matter) for the pure reason that LiveKit SFU does most of the things that waterfall did except that it does not support native Matrix signalling yet. In other words, it's easier to add Matrix signaling support in LiveKit SFU than writing a new SFU from scratch (signalling and related parts of code are at best 10-15% of the code). The thing is that both LiveKit and waterfall use the same tech stack for the SFU and I very often got a feeling that at least 75% of the code will end up being very similar to that of LiveKit, because there are so many things that are in common. Also LiveKit was much more mature and battle-tested with a bigger ecosystem of tools, benchmarks and tests and the best thing is that it's open source!

Maybe as a more general SFU that systems other than Matrix could use as well?

This is certainly a great approach! I think that it would be amazing to have a scalable and performant SFU that is similar to LiveKit yet also supports different flavors of signaling that allows the support of cascading. That being said, if I were to start with the design of such SFU, I would definitely go for a different (Rust-based) tech stack. As for Go/Pion SFU, IMHO LiveKit is the most performant, reliable and versatile open-source option one could get (truth being told, I have not checked any newer SFUs within the past couple of months).

poVoq commented 12 months ago

I see, thanks for the detailed answer!

There seems to be a project reimplementing Pion in Rust in case you are interested in contributing a cascading mode: https://github.com/webrtc-rs/webrtc

daniel-abramov commented 12 months ago

There seems to be a project reimplementing Pion in Rust in case you are interested in contributing a cascading mode: https://github.com/webrtc-rs/webrtc

Indeed, I tried webrtc-rs back in December 2022, but back then it was not mature enough, also the API surface was very hard to use (not paritcularly ergonomic as it tried to mimic the API surface of Pion, whereas Pion's API surface seem to have drawn certain inspiration from the JavaScript API; in Rust such API is not particularly useful as it is possible to design a more robust set of types to ensure certain invariants in the code on compile time; there are also some other interesting Rust-based projects that are related to the SFUs or WebRTC in general, such as such as str0m and Signal SFU).