Open youennf opened 3 years ago
I think that the complexity of the DD is mainly about understanding the concepts and how to use them, not implementing or using it on the SFU.
In that regard, hardcoding the modes supported by the W3C webrtc will not make it much easier (in fact, for me it would be much more difficult) and interoperability would be worse as you would have to test each mode to ensure that they have been hardcoded equally on both sides.
I agree regarding the "generic descriptor", we need to define which info is required. It has been discussed to make the DD just contain dependency information (and only used in case svc is in use) and remove the resolution info from it.
In that case we would need a different generic descriptor to carry that information, google is working on an experiment that could be reused:
I agree regarding the "generic descriptor", we need to define which info is required. It has been discussed to make the DD just contain dependency information (and only used in case svc is in use) and remove the resolution info from it.
Right, it is important to figure out what is needed for both SVC and non SVC use cases. I like the idea of having a simple solution that can be used to transition existing solutions (mostly non SVC based) to SFrame. And complement this simple solution for more advanced cases like SVC.
Let's assume the SFU is not using SVC and dependency descriptor is solely targeted at SVC use cases. The SFU might still want to switch between different streams (based on active speaker, or spatial simulcast) The SFU might want the following information, provided as a header extension:
I think it would be worthy to be able to solve the problem about signaling the simulcast layers too: https://bugs.chromium.org/p/webrtc/issues/detail?id=5207#c46
Not sure if in this same extension, or in a different one.
As pointed by @murillo128, AV1 defines a dependency descriptor that could potentially be applied to more than AV1. The complexity of this descriptor is important though.
In the context of WebRTC, it seems this complexity could potentially be reduced a bit. Let's say that the scalability mode is set using https://w3c.github.io/webrtc-svc/ as L1T2. Both sender and receiver knowing that, they could populate the 4 templates that are useful and start using them without actually having to transmit the definition of these 4 templates. Note though that, if the sender is dynamically changing of mode, there might be a need to send that information to the receiver.
While this limits the potential flexibility, this might not be a practical concern and might reduce complexity/improve interoperability.
The second thing to note is that SFUs sometimes parse the media content themselves to get information such as codec profiles or frame resolutions, which is impossible if content is encrypted. Either some part of the content needs to be transmitted unencrypted, or the same level of information should be available, for instance in the generic descriptor.
That begs the question whether the dependency descriptor can express all necessary information or whether it would be useful to add additional information.