microsoft / MixedReality-WebRTC

MixedReality-WebRTC is a collection of components to help mixed reality app developers integrate audio and video real-time communication into their application and improve their collaborative experience
https://microsoft.github.io/MixedReality-WebRTC/
MIT License
908 stars 282 forks source link

can Support customize local Video source in Unity like renderTexture? #35

Closed raiscui closed 4 years ago

raiscui commented 5 years ago

can Support customize local Video source like renderTexture or some video player rendered texture? also I want get and send rgb/D frame from kinect-dk

djee-ms commented 5 years ago

Hi @raiscui

  1. For the custom video source, I assume you mean being able to send to WebRTC custom frames coming from your app and not directly from the camera, like generated images. I would like to add that feature eventually, as well as allow the application to manipulate the camera itself. However I saw little demand for it so far, and I currently have other high-demand features in the work, so I wouldn't expect it in the short term.

  2. For the Azure Kinect-DK, I unfortunately didn't get a chance yet to try it with this project. For separate RGB or Depth, I think this should work, though this is probably not very interesting. If you want synchronized RGB-D frames however from both the RGB and Depth sensors then these need to be captured via some specialized API like MediaFrameSourceGroup on UWP. I see two solutions for that:

    • If custom sources (1.) were implemented you could capture the frames yourself with that API and send them to WebRTC.
    • Otherwise, in the current situation where the camera is managed by WebRTC itself, I suggest you open an issue with the WebRTC UWP project to see if they can add support on their side, since MixedReality-WebRTC currently leverages their code on UWP to capture video frames.

I appreciate these are not great answers, and there is no way currently to do any of that without modifying the code of MixedReality-WebRTC and/or the code of the WebRTC UWP project. If you want to give it a try however and submit a PR for that then we can talk about how to proceed.

djee-ms commented 4 years ago

Quick update - Azure Kinect doesn't work out of the box because it has a 7-microphone array, and WebRTC internally only supports audio capture devices with 4 channels. This is a problem with the Google implementation: https://bugs.chromium.org/p/webrtc/issues/detail?id=10881

djee-ms commented 4 years ago

Also important note @raiscui, just plugging in Azure Kinect on your PC will break WebRTC initializing, even if you don't plan to use it. And this is again due to Google's implementation and closed as "By Design": https://bugs.chromium.org/p/webrtc/issues/detail?id=9653

jahnotto commented 4 years ago

Any news on item 1, @djee-ms ?

I am doing some custom rendering on Windows application and would like to stream ARGB frames to Unity. As far as I can tell, I can't use any other video source than the webcam.

arufolo-wayahealth commented 4 years ago

+1 for this feature.

jahnotto commented 4 years ago

Perhaps the frames could be sent as raw pixels through a datachannel? We'd miss compression though.

djee-ms commented 4 years ago

@jahnotto no there is a much better way with a custom video source. I too would really like to see that feature. Unfortunately we're currently quite busy with the v1.0 release planned for the end of the month, and some blockers like #74. I will see if I have some spare cycles to push a rough implementation, even partial, but I cannot promise you anything quite yet.

jahnotto commented 4 years ago

Ok, thanks! Looking forward to getting a custom video source.

Meanwhile, I'll use a DataChannel just for prototyping.

jahnotto commented 4 years ago

If anyone else is interested: as a temporary workaround, I am using a DataChannel to send the frames from a Windows desktop application to a Unity (HoloLens) app. It's very slow, and the messages are split into multiple sub messages that need to be aggregated on the HoloLens. It works for prototyping a solution though :) Looking forward to a proper custom video source implementation.

@djee-ms , would you like me to create a new enhancement issue for a custom video source, or should we just use this issue?

djee-ms commented 4 years ago

No I think we can leave that issue open for the custom video source enhancement, the title and discussion are relevant.

jahnotto commented 4 years ago

In my temporary workaround using a datachannel, only a few frames are received if I send them too often (like every 100 ms). It almost seems like the message bus is flooded.

If I send the messages only once per second, it works fine.

Is this to be expected?

djee-ms commented 4 years ago

Did you check the buffering of data channels? I didn't try myself but I know that the internal data channel buffer can get saturated if you try to send faster than it can handles, and in that case calls to Send will fail and drop the data without sending it. You should monitor the OnBufferingChanged event and make sure the buffer doesn't get full. See the comment on the buffering event in the native C++ library: https://github.com/microsoft/MixedReality-WebRTC/blob/59df5a141edf80639b08e40480b7a2bca85595d8/libs/Microsoft.MixedReality.WebRTC.Native/src/data_channel.h#L51-L59

Unfortunately I just noticed that on the C# side the OnBufferingChanged handler is marked as internal and does not dispatch to a publicly-accessible event. But you can make a local modification to expose it. I will fix that in the meantime.

jahnotto commented 4 years ago

Thanks again -- I will try that!

djee-ms commented 4 years ago

I pushed a change that should help, which exposes an event publicly, and ensures an exception is thrown if trying to send too fast. See 1bc2ca661399f038050b0154d7b4e9a2f87c9f5f.

jahnotto commented 4 years ago

I am indeed getting an exception when I send too fast. The BufferingChanged event is never fired though.

gtk2k commented 4 years ago

+1 I very want this feature.

iamrohit1 commented 4 years ago

+1 This would be very helpful. I've a native plugin setup in unity which takes the render texture data and encodes it using nvencode which outputs raw h264 packets.. Is there a way tap these into the stream?

djee-ms commented 4 years ago

Soon! This is on the roadmap for the next 1.1 release, hopefully by the end of the month or so, and there's already some work done for it. Need a few bug fixes, and some more testing and polish now before it's ready to be committed.

jahnotto commented 4 years ago

As requested by @djee-ms on the mixedreality-webrtc Slack channel, I'll describe our use case here:

We are doing raycast volume rendering for a HoloLens 2. As the HoloLens is not powerful enough for this type of rendering, we are doing remote rendering on a PC.

Some definitions used in this solution: Client: the Unity app running on the HoloLens Render server: a desktop PC application running on a PC with a powerful graphics card. The application is using VTK for rendering.

The desired data flow is as follows:

  1. For each Update() in a Unity GameObject, send a render request through a DataChannel from the client to the render server. The render request is a data structure containing all the information which is needed to set up the view frustum for the render server. This includes:
    • camera position
    • camera forward and up vectors
    • desired resolution of the rendered frame
    • stereo separation
    • projection matrices for left and right eye, respectively
    • a unique ID identifying the render request
  2. The render server configures the VTK view frustum according to the newest render request. Any render request received before the newest request is discarded. The render server may hence receive multiple render requests for each actual rendered frame, where only the newest request is actually rendered.
  3. The render server renders left and right eye to two separate RGBA bitmaps
  4. The two bitmaps are merged side- by side to one single bitmap. The horizontal resolution of the merged bitmap is thus twice the resolution of each individual bitmap
  5. The merged bitmap is fed into the video stream along with the render request ID
  6. The video is streamed to from the render server to the client using WebRTC.
  7. The client receives the video stream and displays it as a quad texture. Note that the client needs to know the render request id for each received frame in the video stream.

At the moment, we are using a temporary workaround for the following steps:

  1. Each frame is compressed to a jpeg image
  2. We are using a raw tcp socket connection to send each frame to the client along with the corresponding render request id.

Let me know if you need any further clarification or if you have any ideas on how to improve the data flow.

chrisse27 commented 4 years ago

@djee-ms Our use-case looks like this:

  1. Capture video stream from frame-grabber (or webcam) in Unity.
  2. Process each frame, e.g. cropping, masking.
  3. Send processed frame to HoloLens.
  4. Render frame on HoloLens as texture on a quad.

Currently, we are reading the processed frame from the texture and are sending it via a TCP connection to HoloLens. Our goal is to replace this connection by webRTC and in particular profit from hardware decoding/encoding support.

djee-ms commented 4 years ago

@jahnotto thanks for the details. A few comments:

Otherwise it seems there is no major concern for the external video track feature. That should work with it as currently designed.

@chrisse27 thanks for the update too. Can I confirm in your case what you mean by "process each frame"? Is that done on the CPU side, or via GPU using shaders? Because grabbing a frame from the camera (VRAM) to pull it down to CPU memory for processing and re-uploading immediately to VRAM for hardware encoding for example would be a performance issue. This is incidentally what currently happens for H.264 on UWP and what we want to fix to get CPU usage and thermals lower. If you stay in system memory though and use a software encoder (VP8, VP9) it won't matter.

iamrohit1 commented 4 years ago

@jahnotto my application has a similar pipeline right now to render remotely on a PC. You might want to take a look at 3D Streaming Toolkit which aims to solve a similar problem.

chrisse27 commented 4 years ago

@djee-ms In our application, the processing is done via GPU using shaders.

jahnotto commented 4 years ago

@jahnotto thanks for the details. A few comments:

  • I am worried about step 1., as other users have reported trying to use data channels for high frequency data (sending the camera position each frame; see #83) was not working well, most likely due to the buffering that the data channels do at the SRTP protocol level. Did you not observe any such issue?
  • For step 6., have you looked into video multiplexing? It seems this is the way forward to send metadata associated with a video frame and ensure synchronization. Again, see #83 and especially this comment, although as pointed this would require some work from us if at all feasible. But pointing out the option anyway.

Step 1: I am worried too :) So far it seems fine, but I haven't done any real performance/latency testing because the way we encode frames now (frame-wise jpegs) introduce a very high latency anyway. I read through #83 earlier after you mentioned it in a reply on slack. It seems like the proposed solutions there (hacking the RTP header) requires that I already send a video stream from the hololens to the PC. In my case, I'm only sending camera/frustum settings + any interaction data like clipping planes etc.

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

jahnotto commented 4 years ago

@jahnotto my application has a similar pipeline right now to render remotely on a PC. You might want to take a look at 3D Streaming Toolkit which aims to solve a similar problem.

I had a look on it some time ago. I gave it up because it doesn't support the Unity editor, and it doesn't build for ARM. MR-WebRTC seemed to be more promising.

djee-ms commented 4 years ago

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

I didn't try it to be honest, though it should be like any other codec and work out of the box. The issue is that no metadata API is exposed, so would require some work to surface something in MR-WebRTC, which might be troublesome according to that comment from #83:

But the base WebRTC code currently has no public APIs to provide metadata input to encoding and extract it again upon receipt of each frame.

Also, there is the (unconfirmed) absence of SDP codec negotiation as mentioned on #83. Would that work for your use case? Can you assume an encoding is supported on both sides?

jahnotto commented 4 years ago

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

I didn't try it to be honest, though it should be like any other codec and work out of the box. The issue is that no metadata API is exposed, so would require some work to surface something in MR-WebRTC, which might be troublesome according to that comment from #83:

But the base WebRTC code currently has no public APIs to provide metadata input to encoding and extract it again upon receipt of each frame.

Also, there is the (unconfirmed) absence of SDP codec negotiation as mentioned on #83. Would that work for your use case? Can you assume an encoding is supported on both sides?

I have full control of the hardware and software on both sides. Hence it should be safe to assume that a specific encoding is supported on both sides.

Owaiskb commented 3 years ago

+1 This would be very helpful. I've a native plugin setup in unity which takes the render texture data and encodes it using nvencode which outputs raw h264 packets.. Is there a way tap these into the stream?

Hey, How did you encode render texture. Can you please explain me how you encoded it to h.264 and also does it use gpu for encoding. Please this would be big help 🙏