versatica / mediasoup

Cutting Edge WebRTC Video Conferencing
https://mediasoup.org
ISC License
6.01k stars 1.11k forks source link

Implement AV1 codec #512

Open ibc opened 3 years ago

ibc commented 3 years ago

More random info:

vpalmisano commented 3 years ago

Running with --force-fieldtrials="WebRTC-Vp9DependencyDescriptor/Enabled/WebRTC-DependencyDescriptorAdvertised/Enabled" enables the DependencyDescriptor extension also for VP9. Tested with chromium 91.0.4434.0

jmillan commented 3 years ago

@vpalmisano,

Can you please describe the current state of the implementation, and the decisions taken so far? Feel free to point the needed references to specs, etc.

vpalmisano commented 3 years ago

Current state:

  1. I've added the AV1X descriptors in the supportedRtpCapabilities;
  2. I've added worker/src/RTC/Codecs/AV1X.hpp and worker/src/RTC/Codecs/AV1X.cpp (https://github.com/versatica/mediasoup/commit/e0ded5e9b702fa65046328089a65cd4363e05a1a) that are basically a copy of VP9 classes with an initial parser of the OBU format used by AV1. In the case of single AV1 stream, the parser checks if the packet is the first of a coded video sequence (https://aomediacodec.github.io/av1-spec/av1-spec.pdf 5.3.1); in this case, the packet is marked as keyframe.
  3. As pointed out in https://medooze.medium.com/mastering-the-av1-svc-chains-a4b2a6a23925, using the DependencyDescriptor header extension is the correct/preferred way to handle a SVC configuration. AFAIK, this is required also for handling SFrame configurations, where the actual video content is encrypted at client side and parsing is not possible at SFU side;
  4. I've found that adding the previous filedtrials in chromium, the DependencyDescriptor is added (this works also for VP8 and VP9 codecs).
  5. In the latest commits (https://github.com/vpalmisano/mediasoup/blob/b6e2bb253935333b8d504c1cf085a5e4a0c1c2ab/worker/src/RTC/RtpPacket.cpp#L962) I've started the implementation of a DependencyDescriptor parser; the current implementation parses each single RTP packet, but from the specification (https://aomediacodec.github.io/av1-rtp-spec/#a1-introduction) it is clear that we need to keep a stateful object that should be incrementally updated at each RTP packet arrival.
  6. The chromium implementation: https://chromium.googlesource.com/external/webrtc/+/refs/heads/master/modules/rtp_rtcp/source/rtp_dependency_descriptor_writer.cc and https://chromium.googlesource.com/external/webrtc/+/refs/heads/master/modules/rtp_rtcp/source/rtp_dependency_descriptor_reader.cc
  7. With this patch (https://chromium-review.googlesource.com/c/chromium/src/+/2623011) the SVC layers are advertised in the SDP, but I get this warning from chromium: WARNING:libaom_av1_encoder.cc(182)] Scalability mode is not set, using 'NONE'.
jmillan commented 3 years ago

Thanks for the update @vpalmisano, it's looking awesome!

jmillan commented 3 years ago

Hi @vpalmisano, if here is anything we can help with, let us know.

vpalmisano commented 3 years ago

I'm digging into the code and AFAIK the best place where to collect the DD headers informations is inside the EncodingContext class. I'm trying to figure out how this informations can be used also for VP8 and VP9 codecs, because using this header extension we could also support the SFrame approach.

agouaillard commented 3 years ago

I would not use DD with H264 and VP8. I do not believe DD is mandatory for SFrame. You can use SFrame with a different header extension than DD e.g. with H.264 and VP8. Those are orthogonal, SFrame is used sender and receiver side, DD or equalivalent are used by intermediary SFUs, they serve different purposes.

vpalmisano commented 3 years ago

@agouaillard In the mediasoup case we need to know at least, for each arrived RTP packet, if it belongs to a keyframe or not, and actually this is done inspecting the video packet content. When using SFrame the packet content could be encrypted at client side, so I think that the only way to perform this parsing is using the DD information.

VP8: https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/Codecs/VP8.cpp#L14 VP9: https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/Codecs/VP9.cpp#L13 H.264: https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/Codecs/H264.cpp#L54

agouaillard commented 3 years ago

. When using SFrame the packet content could be encrypted at client side, so I think that the only way to perform this parsing is using the DD information.

You think wrong. This is not the only way.

https://tools.ietf.org/html/draft-ietf-avtext-framemarking-12

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jmillan commented 3 years ago

This is not the only way.

Correct. There are different ways to achieve it.

Anyway we are targeting AV1 here. Let's not add SFrame into play in this context please.

vpalmisano commented 3 years ago

This is not the only way.

Correct. There are different ways to achieve it.

Anyway we are targeting AV1 here. Let's not add SFrame into play in this context please.

Yes. Just to close the digression, this is the discussion about the FrameMarking header usage: https://github.com/versatica/mediasoup/issues/298.

vpalmisano commented 3 years ago

Updates: (This is related to chromium code, so maybe we should report this with a specific bug report, I'm reporting this if anyone could help)

I've followed what indicated in https://medooze.medium.com/mastering-the-av1-svc-chains-a4b2a6a23925 appling this patch https://chromium-review.googlesource.com/c/chromium/src/+/2623011 After this, seems that this https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/modules/peerconnection/rtc_rtp_sender.cc#352 fails, even if simulcast is enabled at web side. I've forced the scalability mode adding these instructions:

webrtc_encoding.scalability_mode = "L2T3_KEY";
webrtc_encoding.num_temporal_layers = 3;

Now chromium is actually using a AV1 simulcast configuration, but the problem I found is that the DependencyDescriptor header doesn't contain anymore the template_dependency_structure_present_flag. Inspecting the chromium code, seems that here (https://chromium.googlesource.com/external/webrtc/+/HEAD/modules/rtp_rtcp/source/rtp_sender_video.cc#399) the descriptor.attached_structure variable is not null when assigned, but it is null here (https://webrtc.googlesource.com/src/+/lkgr/modules/rtp_rtcp/source/rtp_dependency_descriptor_writer.cc#326) when the same variable should be used for writing the header.

vpalmisano commented 3 years ago

Dependency Descriptor implementation status in chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=1196318 https://bugs.chromium.org/p/webrtc/issues/detail?id=11999

SetoKaiba commented 1 year ago

Any update on this?

hakarim740-com-ra commented 1 month ago

Any update on this?