nvpro-samples / vk_video_samples

Vulkan video samples
Apache License 2.0
253 stars 41 forks source link

VP9 codec support, or HEVC + yuva420p format for RGBA support #14

Closed BattleAxeVR closed 7 months ago

BattleAxeVR commented 2 years ago

Hi, I'm trying to get yuva420p pixel format into my game engine (for RGBA videos), and so far the only way I can do it is via VP9 codec since HEVC + alpha encoders are only available on Mac.

I made some test files for yuva420p using VP9 codec (via FFMPEG), which play back fine in VLC but assert in this sample due to the missing VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR symbol which isn't in vulkan_beta.h yet.

I presume this is coming at some point and VK_EXT_video_decode_vp9 define is currently disabled for that reason?

ifdef VK_EXT_video_decode_vp9

    { VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR, "VP9" },

I was wondering if there is an alternative for alpha support (aside from using a separate grayscale video file), and if there is a list of codecs <-> pixel formats that are supported by Vulkan Video and the current beta drivers (and this sample). Thanks for the good work, keep it up.

BattleAxeVR commented 2 years ago

Also, just in case it helps anyone, I wrote a quick n dirty hack if you don't want the sample to rush through your video files (while waiting until framerate vs refresh-rate synchronization is implemented in the sample).

    MSG msg;
    while (PeekMessage(&msg, nullptr, 0, 0, PM_REMOVE)) {
        if (msg.message == WM_QUIT) {
            quit = true;
            break;
        }

        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }

    if (quit) break;

    acquire_back_buffer();

    double t = timer.get();
    float delta_time_sec = static_cast<float>(t - current_time);
    add_frameProcessor_time(delta_time_sec);

    present_back_buffer();

    current_time = t;

    float desired_time_sec = 1.0f / 60.0f; // this number should be replaced with the 1/framerate value from each video file, just hardcoded it for now since my videos are 60 FPS

    if (delta_time_sec < desired_time_sec)
    {
        float missing_time_sec = desired_time_sec - delta_time_sec;
        DWORD missing_time_ms = (DWORD)(missing_time_sec * 1000.0f);
        Sleep(missing_time_ms);
    }
}
BattleAxeVR commented 2 years ago

A test clip (.mov file) of HEVC codec with alpha is included in this sample:

https://developer.apple.com/documentation/avfoundation/media_playback_and_selection/using_hevc_video_with_alpha

I can play back the "puppets_with_alpha_hevc.mov" file in MPC-HC, VLC and MPV (not sure if alpha is actually working or set to FF, but it plays back at least). But the file fails in this sample due to some assert from FFMPEG deep inside the opaque/undebuggable dll wrapper used in this sample, so I'm stuck.

If all major media players can play it back using the hardware in my GPU, this sample should too, ideally. I'd be willing to dig into ffmpeg to fix any issues with yuva420p handling, but not when I don't have access to the wrapper / glue code inside the prebuilt nv_vkvideo_parser DLL and I have no clue what's going on under the hood.

Any help you could provide to play back video files with alpha (either via VP9 or HEVC) would be greatly appreciated.

I've also been considering merely using separate grayscale videos and combining multiple streams into one video file (to maintain synchronization so their frames are all decoded at the same time in parallel), would this be possible? It would open up higher bit depth (12 or 16) grayscale formats which would be better for 3D motion vectors and depth channel encoding (which I need too). So I guess my follow-up question would be: is multi-layer HEVC support on the roadmap or am I better off just manually synchronizing different VK Video decoder instances, 1 per file, and sticking to 1 (grayscale) or 3 (RGB) channels per file. Thanks!

BattleAxeVR commented 2 years ago

Just re-read your comment in the other issue (about the console window popup which still happens, unfortunately), about VP9/AV1. Sorry, totally forgot that was already on your roadmap.

zlatinski commented 2 years ago

I presume this is coming at some point and VK_EXT_video_decode_vp9 define is currently disabled for that reason?

Sorry, VP9 is not supported yet. The Khronos Video TSG has scheduled to work on this Vulkan spec next year.

I was wondering if there is an alternative for alpha support (aside from using a separate grayscale video file), and if there is a list of codecs <-> pixel formats that are supported by Vulkan Video and the current beta drivers (and this sample).

Sorry, at this point, I can't think of any other ways.

zlatinski commented 2 years ago

I can play back the "puppets_with_alpha_hevc.mov" file in MPC-HC, VLC and MPV (not sure if alpha is actually working or set to FF, but it plays back at least).

Thank you, @BattleAxeVR! We'll have a look.

I've also been considering merely using separate grayscale videos and combining multiple streams into one video file (to maintain synchronization so their frames are all decoded at the same time in parallel), would this be possible? It would open up higher bit depth (12 or 16) grayscale formats which would be better for 3D motion vectors and depth channel encoding (which I need too).

I'd propose for you to create two Video decode sessions: one for the regular YCbCr and another for the Lima only. Then you can submit both elementary streams in parallel using two different queues one after another in the same queue using the corresponding video decode sessions. In the first case, you are going to get two completion semaphores from both queue submissions - make sure you submit both semaphores for the graphics composition queue to wait on.

So I guess my follow-up question would be: is multi-layer HEVC support on the roadmap or am I better off just manually synchronizing different VK Video decoder instances, 1 per file, and sticking to 1 (grayscale) or 3 (RGB) channels per file.

@BattleAxeVR, let me think about this question. We should be able to support the multi-layer HEVC. However, I'm not sure yet if one would get a composed layer directly from the decoder or if the API would have to use the graphics API to composite the layers. On this, I'll get back to you. In the interim, if you have any additional content that you'd like us to support in addition to the one you've supplied, please post it here.

zlatinski commented 2 years ago

@BattleAxeVR, I can reproduce the issue with the content from https://developer.apple.com/documentation/avfoundation/media_playback_and_selection/using_hevc_video_with_alpha. We'll address it with the next release.

BattleAxeVR commented 2 years ago

Thanks so much!

HEVC with alpha decoding would be much better for me than VP9 anyway. Now I just gotta find a way to encode HEVC + alpha using yuva420p on a PC (or equivalent higher bit-depth pixel formats with alpha). Any chance you could use the same fix for the encoding side too? I could use the implementation to do encoding. FFMPEG's libx265 doesn't support yuva420p, but the underlying Nvidia hardware decoder chips surely do since GPU decoding works for those files in other media players (probably via DXVA or NVDEC libs).

For grayscale 10/12/16 bit single-channel formats, the current sample appears to only support RGB packed or planar formats, and bit depth is what's used to determine the format, whereas to support grayscale videos, you'd probably need to use the AVPixelFormat value instead of GetBitDepth to determine the output Vulkan format.

image

BattleAxeVR commented 2 years ago

For grayscale video formats, if you have a chance to look into those, that would be also great (12 or 16 bit for depth would be ideal):

AV_PIX_FMT_GRAY8

AV_PIX_FMT_GRAY12BE, ///< Y , 12bpp, big-endian AV_PIX_FMT_GRAY12LE, ///< Y , 12bpp, little-endian AV_PIX_FMT_GRAY10BE, ///< Y , 10bpp, big-endian AV_PIX_FMT_GRAY10LE, ///< Y , 10bpp, little-endian

AV_PIX_FMT_GRAY16BE, ///< Y , 16bpp, big-endian AV_PIX_FMT_GRAY16LE, ///< Y , 16bpp, little-endian

Some of the 64 bit 4:4:4 formats including alpha would also be great to pack in 3D MVs into RGB channels and depth into the alpha channel. I'm using this for fire/smoke volumetric video with Embergen, using DLSS which needs the MVs to resampler/reproject properly.

These two formats in particular would be ideal for some of these use cases:

AV_PIX_FMT_YUVA444P16BE, ///< planar YUV 4:4:4 64bpp, (1 Cr & Cb sample per 1x1 Y & A samples, big-endian)

AV_PIX_FMT_YUVA444P16LE, ///< planar YUV 4:4:4 64bpp, (1 Cr & Cb sample per 1x1 Y & A samples, little-endian)

For RGB or RGBA colour data, I have no interest / need for anything but 420 (444 is a waste of storage / bandwidth / bitrate for colours). But for 3D MVs, I'm not sure how well YUV encodings would work for such linear data, if at all. It's possible that full-res Y -> motion vector magnitude, and half-res CbCr for direction could be OK though, as motion vector directions are probably highly correlated in space. Maybe 420 YUVA for 3D MVs with depth in alpha would work, actually. Who knows, it needs testing. But the higher bit depth is absolutely necessary for storing depth, no question.

zlatinski commented 2 years ago

Hi @BattleAxeVR. It appears that providing support for the LUMA-only channel wouldn't be a very straightforward thing to do. However, towards the start of next year, we plan to add support for multi-layer HEVC (MV-HEVC). Hopefully, this would work for you.

BattleAxeVR commented 2 years ago

I think so, yeah.

Ideally what would work best for volumetric video, to capture an entire animated point cloud, would be, say, one layer of YUVA video for colour and alpha, either 8-bit or 10-bit per channel, for RGBA, (Alpha is full res, like luma, but UV / chroma being half-res is fine for colour channels), and another YUVA 12- or even 16-bit for 3D motion vectors + depth (in alpha channel). 420 might work for 3D MVs, since luma would effectively encode velocity magnitude and chroma would be its direction, which changes possibly (probably) slower than its magnitude in the average case.

Finally I could use a third layer in the video file for normals, although it might be best to store them in tangent space and use two channels (for consistency with rest of rendering). If not I'll just use a three-channel RGB format for model-space normals, probably 8-bit 4:4:4.

BattleAxeVR commented 2 years ago

Hi, just wondering if these features / issues are still being worked on or on your list of TODOs. Thanks in advance for any updates you could give, as it will help me make an informed decision about how to proceed in my own planning.

zlatinski commented 2 years ago

Hi BattleAxeVR, Unfortunately, because of a change in the project's priorities, the MV-HEVC feature has been postponed. Please accept our apologies for any inconvenience this has caused!

BattleAxeVR commented 2 years ago

Any chance you can take a few minutes to fix popup issue? I reported that a year ago. The rest of the feature requests I can work around just by decoding multiple streams using different instances.

zlatinski commented 2 years ago

Any chance you can take a few minutes to fix popup issue? I reported that a year ago. The rest of the feature requests I can work around just by decoding multiple streams using different instances.

We are preparing a new release this week where in addition to the DLLs, you are going to get static libraries. When using the parser's static libraries, you should be able to build with whatever Windows options you'd like - console or not.

BattleAxeVR commented 2 years ago

Thank you! Really appreciated.

BattleAxeVR commented 1 year ago

That's very exciting! If I use a version of FFMPEG with this patch integrated, what does that mean, is RGB+A on HEVC supported Vulkan Video now? Or is it just one more piece of the puzzle necessary for it to work. @zlatinski ?

zlatinski commented 1 year ago

Or is it just one more piece of the puzzle necessary for it to work.

Maybe. Our HW supports multi-layer decode, but we haven't had the time to bring this to the spec and experiment with it.

BattleAxeVR commented 1 year ago

In practice, is there a benefit to this in terms of decoding requirements in the hardware? I assume multi-layer decoding requires multiple video decoding pipelines in parallel behind the scenes. I'm just wondering if HEVC RGB+A can be done in a single decoding stream from start to finish, if you get my meaning. Either way, it would save time some manual code to manage my own multiple stream decoding with multiple instances of the vulkan video decoder running in parallel and then having to sync them based on the frame index. That's the part I'd rather not have to do manually as it seems much better to do this at a lower level (hidden to the application I mean).

BattleAxeVR commented 8 months ago

Hi BattleAxeVR, Unfortunately, because of a change in the project's priorities, the MV-HEVC feature has been postponed.

@zlatinski Hi Tony (happy new year btw), I just wanted to double check about this, I was wondering if there was any work done (or still planned) for MV-HEVC or VP9 support for Vulkan Video.

I just wanted to confirm, either way, even if the answer is no (which is fine) before I start working on my own multi-video-file implementation.

It could be tricky to sync up the decoding of multiple entire video files perfectly, potentially, but I just don't want to spend my time working on something that Nvidia (or others) already have WIP.

Edit, I did a search for MV-HEVC and noticed there does indeed seem to be some HEVC code related to multiple layers, which could potentially solve my woes elegantly. (hopefully it supports more than 2 layers? I have a bunch of extra data I could stuff in those extra layers, even if the decoded frames are all in separate vulkan textures, which is perfect anyway).

https://github.com/nvpro-samples/vk_video_samples/blob/1d9eb74e3d4f03ed56d0d9a0d750d1036d8c63fc/vk_video_decoder/libs/NvVideoParser/src/VulkanH265Parser.cpp#L214