nyanmisaka / ffmpeg-rockchip

FFmpeg with async and zero-copy Rockchip MPP & RGA support
Other
326 stars 48 forks source link

[RFC] HwFramesContext config method support #1

Closed hbiyik closed 5 months ago

hbiyik commented 6 months ago

This is a first attempt to provide a way for mpp and kodi to work with the decoder.

With above changes mpv works, however there still seems to be copies involved it is slower than expected.

I have some ideas on whats going on but I will update when i have a proper understanding

nyanmisaka commented 6 months ago

Can MPV and Kodi handle DRM_PRIME (NV15) natively?

hbiyik commented 6 months ago

i think this is more of task for EGL interface (mesa), what both kodi and mpv does is receive an DRMPRimeFrame, and import this to EGL interface using attribs, so they should both support as long as mesa does, but current mesa-panfork does not, may be ~panfork~ panthor will do.

On the otherhand, kodi can bypass EGL and directly render on DRM planes thorugh kernel but thats always buggy, i dont use it.

nyanmisaka commented 6 months ago

Then the following options are available:

hbiyik commented 6 months ago

Then the following options are available:

  • Modify MPV and Kodi to allow them to use RGA filters.

not maintainable

  • Glue RGA and MPP_DEC together in FFmpeg.

i already tried, this works to certain extend with spagetti code

  • Add proper RGA support in MPP libs, allowing MPP_DEC to output converted images, just like IEP de-interlacing.

i doubt this will ever happen

  • Wait for panthor to support NV15.

i think this is the only reasonable, future-proof way forward at the moment. Meanwhile i can try to take advantage of HwFramesContext config method to take advantage of NV12/NV16 DRM Prime frames

hbiyik commented 6 months ago

What i am trying to do in this PR is:

When you define the HWAccel of the decoder with flag AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX, the client is responsible for creating creating the device and providing its context. Since RKMPP device is newly invented no client supports this.

However at least for decoder (not rga related things), RKMPP device is actually a DRM device its own quirks. There are 2 ways to accomplish the compliancy.

  1. Provide multiple HWDeviceConfigs:

See .hwconfigs is actually an array, so one decoder can have multiple hwaccel device. So it is possible to add another hwaccel here but with .device_type = AV_HWDEVICE_TYPE_DRM. However i think FFMpeg has some issues there. Ie:

ff_get_format is actually only checking the first matched HWAccel but not the array and quits here. I think this is either a bug, or multiple hwaccel device is not a very common practice so i did not go this path.

  1. Allow client to use HWFramesCtx.

When the decoder claims that it supports AV_CODEC_HW_CONFIG_METHOD_HW_FRAMES_CTX method, the client does not necessarily need to give the device context, instead, codec should create its own hwdevice context, and the client will receive DRMPRIME frames over the hw_frames_ctx. This fits better to this decoder and this is what i am trying to implement.

nyanmisaka commented 6 months ago

ff_get_format is actually only checking the first matched HWAccel but not the array and quits here. I think this is either a bug, or multiple hwaccel device is not a very common practice so i did not go this path.

Generally speaking, in FFmpeg whenever a new hardware acceleration is added, a corresponding HW pix_fmt is also added, and frames_derive_to/from is used to import/export to other underlying HW pix_fmt. So it is a rare usage when you want to add hwaccel for rkmppdec but still borrow AV_PIX_FMT_DRM_PRIME.

nyanmisaka commented 6 months ago

Manually merged in 6584518

hbiyik commented 6 months ago

ooops, please do not merge yet, this hardly work as of now.

hbiyik commented 6 months ago

one question: https://github.com/nyanmisaka/ffmpeg-rockchip/commit/49cc93e8925eea1623cc6e2b0202a9a3225e11ec

is this a good idea? those headers are exported to the system and it will cause all drm based other software to be recompiled due to this change. basically an API change for drm interface here.

hbiyik commented 6 months ago

May be use one layer with a specific format like FOURCC("MBUF") and assign it to the some object index, and get the ptr from the mppbuffer? Still a hack but a non breaking one.

hbiyik commented 6 months ago

Or add the new data to the end of structure, so the old interface variables will not be repostioned

UPDATE: it will not help to move to the end of AVDRMObjectDescriptor, new stuff should be at the end of AVDRMFrameDescriptor, because it has multiple members of AVDRMObjectDescriptors, and the new oversized AVDRMObjectDescriptor will overwrite the .nb_planes member.

Even if the ptr and opaque would be a member of AVDRMObjectDescriptor, it would only work when AVDRMObjectDescriptor is not used as an array. But that unconventional, a cleaner way would be to use layers may be...

nyanmisaka commented 6 months ago

Yeah it’s indeed a breaking change for upstreaming. So, either append an new array after layers to correspond to the objects, or create a new format AV_PIX_FMT_RKMPP instead.

nyanmisaka commented 6 months ago

It would be nice if they could reserve some bits when defining AVDRMObjectDescriptor. It's not very intuitive to separate ptr and opaque from it.

hbiyik commented 6 months ago

with some code clean up, there will be no need for MppBuffer object, then only thing necessary is to check if fd is mmapped or not (thats why u need ptr i think).

I remember there was a way to do this without any flags and just by mmap but i need to dig in little bit.

Particularly below part, does it really have to check desc->objects[i].ptr?

static int rkmpp_map_frame(AVHWFramesContext *hwfc,
                           AVFrame *dst, const AVFrame *src, int flags)
{
 ....
    av_assert0(desc->nb_objects <= AV_DRM_MAX_PLANES);
    for (i = 0; i < desc->nb_objects; i++) {
        if (desc->objects[i].ptr) {
            addr = desc->objects[i].ptr;
            map->unmap[i] = 0;

i think you are doing an optimization above to prevent unnecessary unmap and mmaps.

In that case may be we can introduce a format modifier to flag the descriptor as mmapped or not.

hbiyik commented 6 months ago

ok i created a seperate Pr for this, i think this is the cleaniest way, no hacks involved.

hbiyik commented 6 months ago

this works pretty fine. here is the below command to get mpv working:

--vo=gpu-next is optional but it seems it works rather better --swapchain-depth is optional but helps to sync the buffers better

simple drm_prime output: mpv --swapchain-depth=8 --vo=gpu-next --hwdec=rkmpp pathtofile

avg hevc performance: 8bit: fps= 8k@30fps 10bit: not supported

AFBC->non afbc (10bit to 8bit) mpv --swapchain-depth=8 --vo=gpu-next --vd-lavc-o=buf_mode=ext,afbc=on --vf=scale_rkrga=format=nv12 --hwdec=rkmpp pathtofile

avg hevc performance: 8bit afbc -> 8bit: fps= 8k@60fps 10bit afbc->8bit: 8k@60fps

AFBC->non afbc (HDR to HDR) mpv --swapchain-depth=8 --vo=gpu-next --vd-lavc-o=buf_mode=ext,afbc=on --vf=scale_rkrga=format=p010 --hwdec=rkmpp pathtofile

avg hevc performance: 10bit afbc-> 10bit: 8k@30fps

mpv requires below path to play P010 drm frames.

diff --git a/video/out/hwdec/hwdec_drmprime.c b/video/out/hwdec/hwdec_drmprime.c
index 290f11c..9c63ab4 100644
--- a/video/out/hwdec/hwdec_drmprime.c
+++ b/video/out/hwdec/hwdec_drmprime.c
@@ -119,6 +119,7 @@
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_NV12);
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_420P);
     MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_NV16));
+    MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_P010));
     MP_TARRAY_APPEND(p, p->formats, num_formats, 0); // terminate it

     p->hwctx.hw_imgfmt = IMGFMT_DRMPRIME;

may be it would be a good idea to add a format option like 8bit for scale_rkrga so that it will auto select the correct 8bit variant of a 10bit frame variant, ie: NV15->NV12 and NV20->NV16

nyanmisaka commented 6 months ago

may be it would be a good idea to add a format option like 8bit for scale_rkrga so that it will auto select the correct 8bit variant of a 10bit frame variant, ie: NV15->NV12 and NV20->NV16

The swscale format filter of ffmpeg can accept multiple pixel formats and choose the appropriate one, like this format=nv12|nv16. So it is possible to modify the rkrga filter options and parse them manually.

hbiyik commented 6 months ago

fmt1|fmt2 looks cleaner.

One another thing, mesa EGL requires 32 byte aligned frames however decoder outputs 16byte aligned frames and this causes drm prime frames not to be imported properly and the screen outputs green or blue.

Is it possible to add a hstride option to scale_rkrga filter? so that after rga chain the frame could be imported to EGL properly.

nyanmisaka commented 6 months ago

Is it possible to add a hstride option to scale_rkrga filter? so that after rga chain the frame could be imported to EGL properly.

Sure.

https://github.com/nyanmisaka/ffmpeg-rockchip/blob/9e7314093f635b2641c15e756ec45d2a9bc9f0b9/libavutil/hwcontext_rkmpp.c#L245

Will this bypass the panfork issue?

layer->planes[i].offset =
    layer->planes[i-1].offset +
    layer->planes[i-1].pitch * (FFALIGN(hwfc->height, 64) >> (i > 1 ? pixdesc->log2_chroma_h : 0));
hbiyik commented 6 months ago

No, i just tested it. Because having a different pitch value for the buffer will only change the buffer size and offsets, however, those initial values will be overwritten by mpp like here.

To my knowledge mpp does not support preconfigured strides/pitches. If it would, may be setting the hstride of the mppframe before decode_get_frame by manually allocing the input mppframe would do the trickjust by the decoder. But thats just a theory, i was working it around with realigning to 64byte hstride with rga on next chain.

hbiyik commented 6 months ago

i think this is better to be handled by mpp, i asked them if they can support a decoding flow like this, may be thorugh some decccfg..

https://github.com/rockchip-linux/mpp/issues/507

nyanmisaka commented 6 months ago

i think this is better to be handled by mpp, i asked them if they can support a decoding flow like this, may be thorugh some decccfg..

rockchip-linux/mpp#507

We really need this. For the H264 High10 use case, MPP is using hor_align=16 for both NV12 and NV15, while RGA3 requires hor_align=64 for NV15.

https://github.com/nyanmisaka/mpp/commit/a9869977a79863d1988afdfcba00f24200c38ffe

hbiyik commented 5 months ago

@nyanmisaka i tested this extensively, from my pov it is ready to be merged, there is no side affects until now i could see.

nyanmisaka commented 5 months ago

Merged in d0a1484