microsoft / MixedReality-WebRTC

MixedReality-WebRTC is a collection of components to help mixed reality app developers integrate audio and video real-time communication into their application and improve their collaborative experience
https://microsoft.github.io/MixedReality-WebRTC/
MIT License
911 stars 283 forks source link

Support for Insertable Streams #305

Open aboba opened 4 years ago

aboba commented 4 years ago

Chrome (and Edge) WebRTC implementations now have experimental support for the Insertable Streams API.

This API makes it possible to communicate metadata along with audio and video, keeping it in sync.

A presentation on the API can be found here.

An article on its use in AR is here

Is it possible to support this API, given that it requires passing parameters within RTCConfiguration?

djee-ms commented 4 years ago

Hi Bernard, thanks for keeping us up-to-date!

We have not been able to upgrade to the latest version of the Google code yet, but are working on it. That Insertable Streams API looks a priori extremely promising, if it does what I think it does, and if so then I am pretty sure we want to integrate it ASAP. But the upgrade to Google's latest code (on-going on the experimental/undock branch) precludes that; we are now waiting on microsoft/winrtc to provide parity on video capture with what we have currently with webrtc-uwp-sdk, which is scheduled to happen soon. Until then we cannot switch to that branch, as we have no video on UWP, so that would be a regression compared to our current master state.

aboba commented 4 years ago

@djee-ms One of the goals of the Insertable Streams Origin Trial is to gather feedback on the API from developers. @alvestrand had a question about how much meta-data might need to be inserted within the "Insertable Stream". This might affect the interaction with congestion control. Are we talking about ~10 B of meta-data or 1 KB?

djee-ms commented 4 years ago

We didn't investigate a lot, but my intuition is much closer to 1 KB than 10 B; for 10 B you can already make some fake RTP header extension, but the biggest problem is the size limit. Typical use is passing a camera matrix associated with the head pose, so at best 3-4 floats (12-16 B), and that's already too big I think for 1 RTP header extension, you need 2. And that's really the bare minimum info needed, I expect developers to make good use of more space.

stephenatwork commented 4 years ago

It would be interesting to know the threshold at which the interaction with congestion control needs to be considered. If there's some strong implementation reasons to keep it in the low 100s of bytes, perhaps that'd be workable for the scenarios we currently know about (a handful of transforms + some user data). But as Jerome says, if we can avoid making devs worry about this limit, that'd be great.