Open daniel-abramov opened 1 year ago
Hey thanks, this is awesome, I think you can directly capture a new I420Buffer, you don't need to resize the image. Maybe it will be faster? Also I was thinking about making a new track instead of replacing the logo :)
I think you can directly capture a new I420Buffer
Generally, it should be possible with certain APIs, however, the crate that I used for screen capturing (screenshots-rs
) uses a simple API and returns an RgbaImage
back. This is for sure not the most efficient way, that's why I only used it for the example (I believe once the implementation of the screen-capturing API in the SDK is complete, we could remove the screenshots-rs
dependency for the example use case).
you don't need to resize the image
But if I don't resize the captured RgbaImage
to match the dimensions of the VideoFrame
(and the dimensions of the VideoFrame
match the dimensions that the local track was instantiated with), then how is it going to work? Or do you mean that the yuv_helper
will perform some downscaling if necessary?
Maybe it will be faster? Also I was thinking about making a new track instead of replacing the logo :)
Indeed so! It was just a very quick experiment to see how some "poor man's screen-sharing" could be built natively with LiveKit's current version of the SDK for some test project of mine, so decided to share it in case it may be helpful 🙂 I do think that it would indeed be much better if we could capture the screen via the SDK (avoiding dependencies like screenshots-rs
), that would be more efficient an most likely faster.
@theomonnom I know that this PR seems to conflict with the upstream changes (perhaps I need to extract it into a separate example which would be a bit cleaner and easy to understand as I would not need to change much of the wgpu_room
), but I wanted to clarify that thing with resizing (tested it today) before creating a follow-up PR so that I know what's the best way to handle the problem.
Problem: When we publish a track, we create a NativeVideoSource
with specific dimensions, we then use the same specific dimensions when creating a VideoFrame
(or, specifically, when allocating a buffer
inside a VideoFrame
). Then, when the new frames containing the captured screen (or window) are received, we fill the underlying VideoFrame
's buffer and call NativeVideoSource::capture_frame
with that VideoFrame
. This is problematic because if we e.g. capture a window, we cannot guarantee that its size remains constant, hence the frames that we capture may have different dimensions that are not aligned with the dimensions of the published track.
Solutions: To me it seems like there are generally 2 possible solutions to that:
NativeVideoSource
with new dimensions that match the dimensions of the thing that we're capturing.NativeVideoSource
(i.e. the dimensions of the VideoFrame
's buffer), then again everything should work fine.Both solutions seem to have pros and cons. (1) is good if the dimensions of the underlying captured entity do not change often (re-publishing is probably not that expensive right?), otherwise (2) would be efficient (?), but not when the dimensions of the entity being captured are much smaller than the dimensions of the published track (in this case we'd need to perform upscaling). Also, if the underlying library produces RgbaImage
s when capturing a screen, resizing those may not necessarily be very efficient (not hardware accelerated by default AFAIK).
So I wonder what's the best strategy to handle it.
PS: Another topic, maybe a separate one, is how do we handle it on the client side, when e.g. we know that viewers (who presumably use LiveKit JS SDK) render our screen-sharing in a small tile (much smaller than the dimensions of the published track), should we respond to the events from the other clients by adjusting/re-publishing the track so that we do not encode the whole screen given that we know that our screen-sharing is viewed on a 300x300 tile or something like that. Probably I'm referring to what is called "dynacast" (but I'm not 100% sure about that) 🙂
A concise follow-up on https://github.com/livekit/rust-sdks/issues/92 - I've decided to build a quick-and-dirty prototype of what it would look like if we captured a screen.
For the demo, I decided to use
screenshots-rs
library. It's not the fastest one, but the one that is very easy to use (I wanted to usescrap
, but it has not been updated for years, andrustdesk
uses a fork ofscrap
with their own changes that are unfortunately AGPL licensed). But since this is just for the sake of the example and is going to be replaced with a media devices API eventually, I think it's fine 🙂I decided not to do further refactorings and improvements for now (I noticed that the example code could be simplified/streamlined a bit) to keep the PR small.
The
resize()
step is probably very suboptimal at this stage and I'm not sure what's the most elegant and fast solution there - when real screen-sharing is used (possibly when capturing a single window), the size of the window changes. However, the dimensions of the source are configured upfront, so I wonder if changing the size of the captured window would result in re-publishing of the track (to change the source resolution) or if the resizing step would be necessary (to make sure that the image fits into what was configured on a source level). That being said, perhaps the current resizing step could be done in a better way.P.S.:
screenshots-rs
screen capturing is not super fast and given that I additionallyresize()
the image, I doubt that a very high FPS is achievable like this (without overloading the CPU). Looking forward for a native API to efficiently capture the screen and send it to the source 🙂