VPP 3D LUT: copying system memory to video memory should convert layout

cyfdecyf commented 11 months ago

I am trying to support applying 3D LUT using QSV VPP in FFmpeg. My previous implmemention uses VAAPI to create video memory to hold 3D LUT. I'm suggested by @xhaihao to use system memory to hold 3D LUT and let oneVPL to copy system memory into video memory. This is indeed better than using VAAPI directly because it allows oneVPL to take care about the video memory. I chose the difficult path because I didn't find details about how to store 3D LUT in system memory when I started.

The sample code in oneVPL is an example of using system memory to hold 3D LUT. (Thanks to @FurongZhang directing me to this sample.) I followed this sample code in my new implementation but it does not work, applying 3D LUT on my test video resulted in all black output.

Digging into oneVPL-intel-gpu source code to see how is system memory being copied into video memory. It's calling memcpy to copy the separate RGB channel data into mapped video memory without any conversion.

https://github.com/oneapi-src/oneVPL-intel-gpu/blob/a126c456aa8cc8f15af04e05f521705cc61b0668/_studio/shared/src/mfx_vpp_vaapi.cpp#L1161-L1170

As I previously implemented 3D LUT processing using video memory, this clearly is not going to work. Video memory requires the RGBA channel data being packed together, I'm copying my code here:

    memset(surface_u16, 0, surface_image.width * surface_image.height * 4);
    for (r = 0; r < lut_size; ++r) {
        for (g = 0; g < lut_size; ++g) {
            for (b = 0; b < lut_size; ++b) {
                lut_idx = r * lut_size * lut_size + g * lut_size + b;
                s = &lut3d->lut[lut_idx];

                sf_idx = (r * lut_size * mul_size + g * mul_size + b) * 4;
                surface_u16[sf_idx + 0] = (mfxU16)(s->r * UINT16_MAX);
                surface_u16[sf_idx + 1] = (mfxU16)(s->g * UINT16_MAX);
                surface_u16[sf_idx + 2] = (mfxU16)(s->b * UINT16_MAX);
                // surface_u16[sf_idx + 4] is reserved channel.
            }
        }
    }

(By the way, the comment about mfx3DLutMemoryLayout in vpl/mfxstructures.h is quite helpful on inferring the 3D LUT memory layout in video memory.)

Given the current implementation of oneVPL, in order to create 3D LUT using system memory, we have to use the same memory layout as using video memory. Thus SystemBuffer's 3 channels is not useful, or at least can't be considered as separating RGB channels. Here's my current implementation which is working correctly on my testing video. But it's quite weird as I'm using only one channel, so I'm holding off sending path to FFmpeg. (Please take a look on my github branch @xhaihao if you have time.)

I think oneVPL should convert the 3D LUT stored in SystemBuffer to the memory layout using in video memory, or maybe there are other options to use separate RGBA channel but is not correctly set.

FurongZhang commented 11 months ago

Let me check this issue and update to you by this Tuesday.

FurongZhang commented 11 months ago

@cyfdecyf , thank you for reporting the issue. Yes, I checked the code, it is really an issue (I am sorry for the inconvenience). I submitted a PR to fix it https://github.com/oneapi-src/oneVPL-intel-gpu/pull/310. At this moment, I've not had a chance to validate that since I need to set up a test system to test that.

FurongZhang commented 11 months ago

If you happen to have the test app and would like, much appreciate if you could help to test on your side. Thanks a lot in advance!

cyfdecyf commented 11 months ago

@FurongZhang Thanks for looking into this issue. I'll try your fix this weekend.

Would you release a new version soon? Since I'm trying to use oneVPL in FFmpeg, for 3D LUT system memory to work correctly, I have to depend on oneVPL version which have this issue fixed. I plan to check oneVPL version and add conditional compilation flag in FFmpeg.

FurongZhang commented 11 months ago

@cyfdecyf , Quarterly release. Q3 release happened in Oct. Q4 release trends to be in next 2 months.

FurongZhang commented 11 months ago

@cyfdecyf , I will be out of office next week(take leave). If you have a chance to test it from your side tomorrow, I can merge the fix tomorrow; if you confirm that this weekend, I can continue to follow up this when I am back to office 12/11.

cyfdecyf commented 11 months ago

@FurongZhang Finally got time to test this today.

I can't get self compiled libvpl (latest git master) and oneVPL-intel-gpu (on your branch) to work. To avoid creating packages on Arch Linux, I use LD_LIBRARY_PATH to force using my compiled version of libvpl and oneVPL-intel-gpu. With LD_DEBUG=files I got following log when invoking ffmpeg:

     21137:     calling init: /home/cyf/p/libvpl/build/libvplstubrt64.so.0.0
     21137:
     21137:     opening file=/home/cyf/p/libvpl/build/libvplstubrt64.so.0.0 [0]; direct_opencount=1
     21137:
     21137:
     21137:     file=/home/cyf/p/oneVPL-intel-gpu/build/__bin/release/libmfx-gen.so.1.2.10 [0];  dynamically loaded by /home/cyf/p/libvpl/build/libvpl.so.2 [0]

......

[Parsed_vpp_qsv_0 @ 0x5562e2d6a140] load 3D LUT from file: ./65x.cube
[Parsed_vpp_qsv_0 @ 0x5562e2d6a140] Use Intel(R) oneVPL to create MFX session with the specified MFX loader
     21137:     opening file=/home/cyf/p/oneVPL-intel-gpu/build/__bin/release/libmfx-gen.so.1.2.10 [0]; direct_opencount=3
     21137:
     21137:     opening file=/usr/lib/libigfxcmrt.so.7 [0]; direct_opencount=2
     21137:
[Parsed_vpp_qsv_0 @ 0x5562e2d6a140] VPP: input is system memory surface
[Parsed_vpp_qsv_0 @ 0x5562e2d6a140] VPP: output is system memory surface
[auto_scale_0 @ 0x5562e2d6bf40] w:3840 h:2160 fmt:yuv422p10le sar:1/1 -> w:3840 h:2160 fmt:p010le sar:1/1 flags:0x00000004
    Last message repeated 2 times
[Parsed_vpp_qsv_0 @ 0x5562e2d6a140] Error running VPP: unsupported (-3)
[vf#0:0 @ 0x5562e2d234c0] Error while filtering: Function not implemented
Failed to inject frame into filter network: Function not implemented

Not sure how to go further. Any other dependent libraries should also be updated?

FurongZhang commented 11 months ago

@cyfdecyf , I am back to office. I will test it this week from my side.

FurongZhang commented 11 months ago

@cyfdecyf , do you have your FFMPEG patch link? I can use your FFMPEG to test?

cyfdecyf commented 10 months ago

@FurongZhang Here's my implementation branch. Please drop the commit "avfilter/vf_vpp_qsv: fix 3D LUT surface." for testing your fix.

MicroYY commented 10 months ago

@cyfdecyf Could you pls provide the cmd line and lut table file? Or you can verify with https://github.com/oneapi-src/oneVPL-intel-gpu/pull/310 yourself.

cyfdecyf commented 10 months ago

@MicroYY I'm not able to test #310 as I mentioned before. I'm not an expert on video encoding and don't know how to get various Intel libraries to work together.

If you can compile FFmpeg and use the latest libvpl, here's the command to apply LUT from file using my implementation:

./ffmpeg -y -v verbose -init_hw_device qsv=hw -filter_hw_device hw -i "${INPUT}" -c:a copy \
  -preset medium -q 22 -map_metadata 0 \
  -vf "vpp_qsv=lut3d_file=${LUT3D_FILE}" -c:v hevc_qsv "${OUTPUT}"

Here's a gist that can generate identity LUT. For better testing, I suggest use a more complex LUT.

Sherry-Lin commented 10 months ago

Fixed by https://github.com/oneapi-src/oneVPL-intel-gpu/pull/310. If no surprise, it could be in intel-onevpl-24.1.1 release which will be tagged in 2 weeks later.

FurongZhang commented 10 months ago

@cyfdecyf , we have merged the fix into the repo. After running all test, it could be in 24.1.1. In your ffmpeg patch, you may need to add if VPL version >= 24.1.1, this change will take effect. Please feel free to let us know if you have any questions, thank you!

cyfdecyf commented 10 months ago

@FurongZhang Thanks. I'll test my patch when Arch Linux has updated its onevpl package to the next release.

FurongZhang commented 10 months ago

@cyfdecyf , Let me find some document for you to compile VPL/Media driver etc. so that you can also verify it from your side. We have verified from our side.

cyfdecyf commented 9 months ago

@FurongZhang I can confirm now my FFmpeg LUT patch using system memory now works with onevpl-intel-gpu 24.1.1. (This is the version currently available on Arch Linux.)

But I don't know how to do compile time version test for onevpl-intel-gpu. FFmpeg is using libvpl directly and onevpl-intel-gpu is a runtime dependency of libvpl if I understand correctly.

FurongZhang commented 9 months ago

Thank you @cyfdecyf for the confirmation. Please let me know if your FFMPEG patch is upstreamed. If you don't mind, I would like to update FFMPEG 3DLUT command into my article.

I suppose you are able to build onevpl-intel-gpu, right? https://github.com/intel/libvpl , this is the libvpl project, there are instructions to build and use. Please feel free to let me know if you have any questions.

cyfdecyf commented 9 months ago

@cyfdecyf , we have merged the fix into the repo. After running all test, it could be in 24.1.1. In your ffmpeg patch, you may need to add if VPL version >= 24.1.1, this change will take effect. Please feel free to let us know if you have any questions, thank you!

@FurongZhang I don't know how to check onevpl-intel-gpu version >= 24.1.1 in FFmpeg source code, which only includes libvpl. (now at version 2.10.1) As onevpl-intel-gpu is runtime dependency of libvpl, I guess it's not possible to do this during FFmpeg compilation. I'll send my FFmpeg patch after resolving this version checking problem.

I'm using the packages provided by Arch Linux thus there's no need to compile by myself.

FurongZhang commented 9 months ago

@cyfdecyf , let me check this for you " I don't know how to check onevpl-intel-gpu version >= 24.1.1 in FFmpeg source code". I will ask Haihao if he knows this.

FurongZhang commented 9 months ago

@xhaihao

xhaihao commented 9 months ago

@cyfdecyf No, we don't have a way to get the library version. What we can get is the runtime API version (Note this version might be different from the libvpl API version), for example, 2.10.1

xhaihao commented 9 months ago

@cyfdecyf You may use QSV_VERSION_ATLEAST to check the API version and and QSV_RUNTIME_VERSION_ATLEAST to check the runtime version.

cyfdecyf commented 9 months ago

@xhaihao Thanks. I've tried and I can only get mfxVersion as 2.10.

Since there's now way to get library version, I'll just print a message to remind the required version of oneVPL-intel-gpu. I'll send the FFmpeg patch this weekend.

intel / vpl-gpu-rt

VPP 3D LUT: copying system memory to video memory should convert layout #307