open-webrtc-toolkit / owt-client-native

Open WebRTC Toolkit client SDK for native Windows/Linux/iOS applications.
https://01.org/open-webrtc-toolkit
Apache License 2.0
392 stars 181 forks source link

How to convert NativeHandleBuffer to I420Buffer? #497

Open lnogueir opened 3 years ago

lnogueir commented 3 years ago

I am currently using the MSDKVideoDecoderFactory but my video sink only supports frames in the I420 pixel format.

So my issue is that NativeHandleBuffer gives me kNative frames and it does not implement the ToI420() method, it just returns nullptr.

Can someone give me some guidance on how I would go about converting these kNative frames to a proper I420 frame? What is the underlying pixel format of these kNative frames? Is it NV12?

Thanks!

JamesTerm commented 3 years ago

I'll take a stab at your 2nd question, as I am working on something in this area. The native frame is a compressed frame (e.g. h264, VP9, h265) whatever the SDP negotiates to, where it starts out with all the compressions, and then get's filtered to the preferences that were pushed in

 ClientConfiguration.video_encodings.push_back(video_params)   

the P2PClientConfiguration. So this means if you want some control over what you want to support you can filter this down, and I believe these are prioritized (I am currently trying to verify this for some work I need to do). The ones I am interested in are h264 and then VP9 (and eventually h265). This SDP negotiation reminds a lot of how HDMI and EDID negotiations work. Hope this helps. A fair word of warning, if you limit your codec's list it is possible to run into a device that doesn't support what you offer, in which case the SDP will log an error, and this device will not be supported for publish.

One thing I am interested in is at what branch does the SDK offfer Native frames? 4.3 ? This will be something I need to look at soon as well.

lnogueir commented 3 years ago

@JamesTerm Thanks for your reply.

Are you sure that the NativeHandleBuffer is a compressed frame? Since it inherits from the webrtc::VideoFrameBuffer isn't it supposed to be a decoded frame?

I am saying that because all the classes that implement the webrtc::VideoFrameBuffer like webrtc::I420Buffer and webrtc::NV12Buffer all represent uncompressed raw video frames.

JamesTerm commented 3 years ago

I will find out, because in 4.2 I get native frames here:

void VideoReceiveStream::OnFrame(const VideoFrame& video_frame)

I want to see what these are in 4.3... I assumed they are native frames by the name itself, but will need to map it out for some certainty and compare against what I know on 4.2. I'm currently having issues with 4.3 decoder failing in init for h264, so I am hoping to bypass it. I'm hoping to gain some clarity on this today.

JamesTerm commented 3 years ago

Ok I reviewed my code and VideoFrameBuffer is indeed what I use for native frames like so:

const rtc::scoped_refptr<VideoFrameBuffer> video_frame_buffer=frame.video_frame_buffer();
const INativeBufferInterface* frame_info=video_frame_buffer->GetINative();

During 4.2 I had to make my own native interface setup, and so this is tested and confirmed, and makes sense as the point of native frames as indicated in the header:

//The external client can produce such
// native frame buffers from custom video sources, and then cast it back to the
// correct subclass in custom video sinks. The purpose of this is to improve
// performance by providing an optimized path without intermediate conversions.

And so, this is where it starts with the SDP negotiations as mentioned previously. I am still looking at 4.3 to see how to use this newer interface. I know how they decoded them before, but branch 76 has drastically changed from 70, so I will need to see how it works for it.

lnogueir commented 3 years ago

What I am talking about is that once we decode the image, it gives us a webrtc::VideoFrameBuffer, but a custom implementation named NativeHandleBuffer, but that must be a raw video frame with some underlying pixel format.

Because if we look at video_frame_buffer.h:

// Base class for frame buffers of different types of pixel format and storage.
// The tag in type() indicates how the data is represented, and each type is
// implemented as a subclass. To access the pixel data, call the appropriate
// GetXXX() function, where XXX represents the type. There is also a function
// ToI420() that returns a frame buffer in I420 format, converting from the
// underlying representation if necessary. I420 is the most widely accepted
// format and serves as a fallback for video sinks that can only handle I420,
// e.g. the internal WebRTC software encoders. A special enum value 'kNative' is
// provided for external clients to implement their own frame buffer
// representations, e.g. as textures. The external client can produce such
// native frame buffers from custom video sources, and then cast it back to the
// correct subclass in custom video sinks. The purpose of this is to improve
// performance by providing an optimized path without intermediate conversions.
// Frame metadata such as rotation and timestamp are stored in
// webrtc::VideoFrame, and not here.
class RTC_EXPORT VideoFrameBuffer : public rtc::RefCountInterface {
 public:
  // New frame buffer types will be added conservatively when there is an
  // opportunity to optimize the path between some pair of video source and
  // video sink.
  enum class Type {
    kNative,
    kI420,
    kI420A,
    kI444,
    kI010,
    kNV12,
  };

So, I am trying to get the underlying pixel format of the video frame because I don't know what it is from "native" and my application can only handle I420 pixel format.

JamesTerm commented 3 years ago

This layout may help find the details of the answer.

When I traced this, the compressed frame makes it to the decoder, using this same structure identified as native, and then within the decoder itself a new frame is created and takes its place as the native frame is destroyed. So the decoder shouldn't be using native to be returned, that is to identify the frames that come into it only. I haven't looked at details of how the decoders work, because I have tested it by setting the render type to i420 and have confirmed the frames to be correct in testing. (I could have misread something so please double check this).

One other clue that may help talk\owt\sdk\base\webrtcvideorendererimpl.cc It doesn't care about the video_frame_buffer()->type() unless it is native (still trying to work out how to force this).

It does care what the renderer type is, so if I set the renderer type to kI420 that is what we are going to get and the frames will come here before being sent out to the client code. In my tests the frames come in as h264 or VP9 get decoded to I420 hit these lines of code and get sent out. These tests are confirmed in 4.2, and VP9 test is confirmed for 4.3. If we set a breakpoint here we can see the callstack, but this is asynchronous to where they get processed in VideoReceiveStream, so we would have to set a breakpoint there as well. This is how I mapped out the layout, at which we should be able to work out the missing pieces.

lnogueir commented 3 years ago

Thank you very much. This information will certainly help me figure this out.

I should start working on this decoding part again soon and will post here when I have updates.

Meonardo commented 2 years ago

Hi @lnogueir Did you have any update?

Recently, I had the same situation: I want to pass the yuv data to external renderer,

here is what I found: The underlying pixel format of the kNative frames is NV12(see intel media-sdk manual), the mfxFrameData store the yuv data (see the mfxFrameData definition), by using the code block below I can retrieve yuv data (see the file msdkvideodecoder.cc):

mfxFrameData frame_data = pOutputSurface->Data;
mfxMemId dxMemId = frame_data.MemId;
mfxFrameInfo frame_info = pOutputSurface->Info;

m_pmfx_allocator_->LockFrame(dxMemId, &frame_data);

/*int w = frame_info.Width;
int h = frame_info.Height;
int stride_uv = (w + 1) / 2;
uint8_t* data_y = frame_data.Y;
uint8_t* data_u = frame_data.U;
uint8_t* data_v = frame_data.V;

rtc::scoped_refptr<webrtc::I420Buffer> i420_buffer =
    webrtc::I420Buffer::Copy(w, h, data_y, w, data_u, stride_uv, data_v,
                             stride_uv);
rtc::scoped_refptr<VideoFrameBuffer> buffer = std::move(i420_buffer);*/

if (callback_) {
 // NV12 case
  rtc::scoped_refptr<owt::base::NativeHandleBuffer> buffer =
      new rtc::RefCountedObject<owt::base::NativeHandleBuffer>(
          (void*)frame_data.Y, frame_info.CropW,
          frame_info.CropH);
  webrtc::VideoFrame decoded_frame(buffer, inputImage.Timestamp(), 0,
                                   webrtc::kVideoRotation_0);
  decoded_frame.set_ntp_time_ms(inputImage.ntp_time_ms_);
  decoded_frame.set_timestamp(inputImage.Timestamp());
  callback_->Decoded(decoded_frame);
}

m_pmfx_allocator_->UnlockFrame(dxMemId, &frame_data);
lnogueir commented 2 years ago

Hi @lnogueir Did you have any update?

Hey @Meonardo, sorry I do not.