microsoft / MixedReality-WebRTC

MixedReality-WebRTC is a collection of components to help mixed reality app developers integrate audio and video real-time communication into their application and improve their collaborative experience
https://microsoft.github.io/MixedReality-WebRTC/
MIT License
898 stars 277 forks source link

Cannot add more than two Peer Connections with video and audio streams #822

Open fc-ts opened 2 years ago

fc-ts commented 2 years ago

Describe the bug I'm using this library to make a video conference app, in which users from different platforms implementing the WebRTC protocol can join the call. Everything goes right until a third user joins the call sharing video and audio streams, because at this point the app on the HoloLens 2 crashes saying: Exception thrown at 0x74D4141A (vcruntime140_app.dll) in AppName.exe: 0xC0000005: Access violation reading location 0x21900040..

I also tried the same procedure from Unity Editor and, even if it does not crash always here, I found out that the error is thrown somewhere in the c++ implementation starting from the first LoadRawTextureData operation in the VideoRenderer script, but the error is not thrown even when adding and removing participants with both audio and video streams multiple times in a row.

To Reproduce Steps to reproduce the behavior:

  1. Start a video call between an HoloLens 2 and a browser user streaming audio and video.
  2. Add a third user from a tablet streaming audio and video.
  3. Add a fourth user from a tablet streaming audio and video.
  4. Before the video stream of the fourth user is rendered, the app crashes pointing to an error at c++ level: Exception thrown at 0x74D4141A (vcruntime140_app.dll) in AppName.exe: 0xC0000005: Access violation reading location 0x21900040.

Expected behavior I would expect to be able to render 3+ video and audio streams at the same time.

Environment Please fill the information for each peer if different

So i was wondering if this library has ever been tested with (or if it was intended to support) multiple user streams.

Thanks in advance, I'm available to give more informations if necessary.

djee-ms commented 2 years ago

Are you using a media server, with one PeerConnection per remote participant?

fc-ts commented 2 years ago

Are you using a media server, with one PeerConnection per remote participant?

Yes

djee-ms commented 2 years ago

I'm not sure how to help here, this looks like the frame buffer passed to LoadRawTextureData is invalid somehow. This is (almost directly) coming from the internal libwebrtc C++ implementation from Google. Without a callstack or crash dump I don't see how we can diagnose this. And since we don't publish debug symbols, that means rebuilding the project from scratch.

fc-ts commented 2 years ago

I'm not sure how to help here, this looks like the frame buffer passed to LoadRawTextureData is invalid somehow. This is (almost directly) coming from the internal libwebrtc C++ implementation from Google. Without a callstack or crash dump I don't see how we can diagnose this. And since we don't publish debug symbols, that means rebuilding the project from scratch.

I was able to let the app crash also while emulating the same procedure on the Unity Editor. If it can be of any help I share the error.log file generated by Unity from the crash.

The problem seems exactly what you pointed out, but maybe with additional information you can see something I'm missing here: error.log

fc-ts commented 2 years ago

Okay I have some new info.

Basically, some time ago I made changes to the VideoRenderer script because I noticed that video tracks with an odd dimension (width and/or height) rendered wrong, just like this:

With some debugging and searching, I found out that the pixel stride for the U and V planes of the YUV image rendered are equal to half of the Y component dimension rounded up, while the VideoRenderer script https://github.com/microsoft/MixedReality-WebRTC/blob/master/libs/unity/library/Runtime/Scripts/Media/VideoRenderer.cs, at rr.255-256 sets the chromaWidth and chromaHeight as half of the luma dimensions rounded down:

int chromaWidth = lumaWidth / 2;
int chromaHeight = lumaHeight / 2;

So I tried to add these lines of code to fix the issue:

// Round up if the lumaWidth and/or lumaHeight are odd, because the pixel stride of U and V
// planes is half of the Y plane size rounded up.
int chromaWidth = lumaWidth / 2;
if (lumaWidth % 2 != 0)
{
    chromaWidth++;
}
int chromaHeight = lumaHeight / 2;
if (lumaHeight % 2 != 0)
{
    chromaHeight++;
}

But it wasn't enough, so then I experimented and modified the lines of the original script from this (VideoRenderer rr.280-287):

var src = new IntPtr(buffer);
int lumaSize = lumaWidth * lumaHeight;
_textureY.LoadRawTextureData(src, lumaSize);
src += lumaSize;
int chromaSize = chromaWidth * chromaHeight;
_textureU.LoadRawTextureData(src, chromaSize);
src += chromaSize;
_textureV.LoadRawTextureData(src, chromaSize);

to this:

int lumaSize = lumaWidth * lumaHeight;
_textureY.LoadRawTextureData(frameData.DataY, lumaSize);

int chromaSize = chromaWidth * chromaHeight;
_textureU.LoadRawTextureData(frameData.DataU, chromaSize);
_textureV.LoadRawTextureData(frameData.DataV, chromaSize);

I did it just for the sake of the experimentation and because it was more intuitive for me.

But surprisingly, with this last piece of code, the YUV image for odd video dimensions was fixed and therefore rendered correctly.

I was pretty sure I knew what I did, but now that this bug has emerged I guess something is missing in my changes, because if I revert back to the original VideoRenderer code, the app does not crash even with 4+ video tracks and Peer Connections streaming at the same time. (But obviously the problem with the odd dimensional videos pops up again, which I need to avoid).

I hope this informations can help you go deeper into the problem!

Also I'd really like to know if in your opinion the changes I made to fix the odd dimensional video problem makes sense or I did something wrong.

djee-ms commented 2 years ago
int chromaWidth = lumaWidth / 2;

You're perfectly correct, that's wrong!

int chromaWidth = lumaWidth / 2;
if (lumaWidth % 2 != 0)
{
    chromaWidth++;
}

That works, though a more compact form is int chromaWidth = (lumaWidth + 1) / 2; which transforms a "round down" into a "round up". That's what we do internally.

_textureU.LoadRawTextureData(frameData.DataU, chromaSize);

I can't remember the details but looking at your change it makes tons of sense, and we shouldn't assume in the code that the Y/U/V planes are in the same memory block as we do, since the frame contains 3 pointers the implementation is allowed to allocate 2 or 3 memory blocks, so your code is more correct.

I'm happy to review a fix if you open a PR for that change. I'm pretty sure that this is all there is to your crash. And the fact it's non-deterministic may be due to some internal logic of how the buffer(s) is/are allocated in libwebrtc, which I don't know or remember. In any case we shouldn't make the assumptions we make here and your change fixes that.

fc-ts commented 2 years ago

First of all, thanks for the fast responses!

I'm happy to see the changes are correct and I will certainly open a PR for these!

About the crash, why do you think it does not happen with the old code, given the fact that it were "less correct" than the new one?

djee-ms commented 2 years ago

Not sure. If you increased the size of the chroma you calculate in C# only (your first change) then you increased chromaSize so the last call for the V plane will produce some buffer overflow possibly?

fc-ts commented 2 years ago

Unfortunately, the funny thing is that it crashes on the Y component

_textureY.LoadRawTextureData(frameData.DataY, lumaSize);

so it does not even arrive to load the U and V data!

astaikos316 commented 2 years ago

Are you using a media server, with one PeerConnection per remote participant?

Yes

@fc-ts May I ask what media server you are using and how you set up the peers in your unity project on the hololens? I've been trying to figure out a way to do this for some time but have run into numerous performance issues.