k4yt3x / video2x

A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018.
https://video2x.org
GNU Affero General Public License v3.0
10.71k stars 1k forks source link

Completely black video output [Beta 6.0.0-3 + docker fix pull request] #1187

Closed BuyMyMojo closed 2 days ago

BuyMyMojo commented 3 weeks ago

It does the full encode and hammers my GPU as expected but the result is a black screen

input video:

https://github.com/user-attachments/assets/ecf23845-8327-4289-9462-ab6c6f052795

Output:

https://github.com/user-attachments/assets/7517f701-687c-4c8c-8fb2-3784a742541d

command:

sudo docker run -it --rm --device=/dev/dri/renderD129 -v $PWD:/host video2x:6.0.0-beta.3-dockerfix3 --input "15s-bigbuckbunny.mp4" --output "15s-bigbuckbunny-2x.mp4" -f realesrgan -r 2 -m realesr-animevideov3
k4yt3x commented 3 weeks ago

I tried it on my A6000 and it seems to work fine. Perhaps you're mounting the wrong device or you're running out of VRAM? What GPU are you using?

https://github.com/user-attachments/assets/433a5359-4f8f-48b6-90c3-fdccbec3fa15

BuyMyMojo commented 3 weeks ago

7800xt, it's fore sure using the right one I can see the usage go up but the vram doesn't really get too high

BuyMyMojo commented 3 weeks ago

image

k4yt3x commented 3 weeks ago

How curious. It looks to be working normally -- your device is detected, it's processing frames, and you have plenty of VRAM. Just by looking at this I'm not sure why it's outputting black frames honestly. On the other hand, I'm sure libplacebo works fine right?

BuyMyMojo commented 3 weeks ago

since it's the docker build I don't see why it wouldn't be working fine but strangely I don't see the output's size grow at all until it's done at which point the video is blank but MPV still reports it being 60fps but with a bit rate of around 128k like the audio

k4yt3x commented 3 weeks ago

If you turn on debug logging --loglevel debug does it print anything interesting?

BuyMyMojo commented 3 weeks ago

oh wow yeah, it does not like it;

[0 AMD Radeon RX 7800 XT (RADV NAVI32)]  queueC=1[4]  queueG=0[1]  queueT=0[1]
[0 AMD Radeon RX 7800 XT (RADV NAVI32)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 AMD Radeon RX 7800 XT (RADV NAVI32)]  fp16-p/s/u/a=1/1/1/1  int8-p/s/u/a=1/1/1/1
[0 AMD Radeon RX 7800 XT (RADV NAVI32)]  subgroup=64  basic/vote/ballot/shuffle=1/1/1/1
[0 AMD Radeon RX 7800 XT (RADV NAVI32)]  fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/1
[h264 @ 0x7f8690003c80] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 0x7f8690003c80] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 0x7f8690003c80] Format yuv420p chosen by get_format().
[h264 @ 0x7f8690003c80] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 0x7f8690003c80] no picture 
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
[h264 @ 0x7f8690003c80] no picture 
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Processing frame 0/898 (0.00%); time elapsed: 1s[2024-10-14 17:42:19.088] [debug] Processed frame 1/898
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Processing frame 1/898 (0.11%); time elapsed: 1s[2024-10-14 17:42:19.306] [debug] Processed frame 2/898
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Processing frame 2/898 (0.22%); time elapsed: 1s[2024-10-14 17:42:19.521] [debug] Processed frame 3/898
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Processing frame 3/898 (0.33%); time elapsed: 1s[2024-10-14 17:42:19.742] [debug] Processed frame 4/898
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Processing frame 4/898 (0.45%); time elapsed: 1s[2024-10-14 17:42:19.960] [debug] Processed frame 5/898
[h264 @ 0x7f8690003c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
k4yt3x commented 3 weeks ago

Ah at least that's some progress. I'll look into what this error means. Thanks for the log.

k4yt3x commented 3 weeks ago

Actually I tried it myself and I'm getting similar messages, but my output video is fine:

image

BuyMyMojo commented 3 weeks ago

I looked again and before it lists the settings of the x264 encode it prints this:

Video processing started; press SPACE to pause/resume, 'q' to abort.
[AVFormatContext @ 0x7f8030001500] Opening '15s-bigbuckbunny.mp4' for reading
[file @ 0x7f8030001b00] Setting default whitelist 'file,crypto,data'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] ISO: File Type Major Brand: isom
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Processing st: 0, edit list 0 - media time: 800, duration: 358800
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] drop a frame at curr_cts: 359600 @ 896
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Offset DTS by 800 to make first pts zero.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Setting codecpar->delay to 2 for stream st: 0
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Processing st: 1, edit list 0 - media time: 1025, duration: 718848
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] drop a frame at curr_cts: 0 @ 0
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] skip 1 audio samples from curr_cts: 1024
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] Before avformat_find_stream_info() pos: 29029595 bytes read:56471 seeks:1 nb_streams:2
[h264 @ 0x7f8030002640] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 0x7f8030002640] Decoding VUI
[h264 @ 0x7f8030002640] nal_unit_type: 8(PPS), nal_ref_idc: 3
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 64, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft32_asm_float_fma3 - type: fft_float, len: 32, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 64, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft32_asm_float_fma3 - type: fft_float, len: 32, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 120, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_pfa_15xM_asm_float_avx2 - type: fft_float, len: 60, factors[2]: [15, 2], flags: [aligned, inplace, out_of_place, preshuf, asm_call]
            fft4_fwd_asm_float_sse2 - type: fft_float, len: 4, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 128, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_sr_asm_float_fma3 - type: fft_float, len: 64, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 480, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_pfa_15xM_asm_float_avx2 - type: fft_float, len: 240, factors[2]: [15, 2], flags: [aligned, inplace, out_of_place, preshuf, asm_call]
            fft16_asm_float_fma3 - type: fft_float, len: 16, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 512, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_sr_asm_float_fma3 - type: fft_float, len: 256, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 960, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_pfa_15xM_asm_float_avx2 - type: fft_float, len: 480, factors[2]: [15, 2], flags: [aligned, inplace, out_of_place, preshuf, asm_call]
            fft32_asm_float_fma3 - type: fft_float, len: 32, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_inv_float_avx2 - type: mdct_float, len: 1024, factors[2]: [2, any], flags: [aligned, out_of_place, inv_only]
        fft_sr_asm_float_fma3 - type: fft_float, len: 512, factor: 2, flags: [aligned, inplace, out_of_place, preshuf, asm_call]
Transform tree:
    mdct_fwd_float_c - type: mdct_float, len: 1024, factors[2]: [2, any], flags: [unaligned, out_of_place, fwd_only]
        fft_sr_ns_float_fma3 - type: fft_float, len: 512, factor: 2, flags: [aligned, inplace, out_of_place, preshuf]
[h264 @ 0x7f8030002640] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 0x7f8030002640] Decoding VUI
[h264 @ 0x7f8030002640] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 0x7f8030002640] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 0x7f8030002640] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 0x7f8030002640] Format yuv420p chosen by get_format().
[h264 @ 0x7f8030002640] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 0x7f8030002640] no picture 
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] demuxer injecting skip 1025 / discard 0
[aac @ 0x7f8030003c80] skip 1025 / discard 0 samples due to side data
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] After avformat_find_stream_info() pos: 1508 bytes read:89239 seeks:2 frames:2
[h264 @ 0x7f8030003c80] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 0x7f8030003c80] Decoding VUI
[h264 @ 0x7f8030003c80] nal_unit_type: 8(PPS), nal_ref_idc: 3
[libx264 @ 0x7f8030017800] using mv_range_thread = 24

main thing I'm noticing here is this part:

[h264 @ 0x7f8030002640] Format yuv420p chosen by get_format().
[h264 @ 0x7f8030002640] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 0x7f8030002640] no picture 
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] demuxer injecting skip 1025 / discard 0
[aac @ 0x7f8030003c80] skip 1025 / discard 0 samples due to side data
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f8030001500] After avformat_find_stream_info() pos: 1508 bytes read:89239 seeks:2 frames:2

I'm not entirely sure why it's detecting this input as "No Picture", on this and another couple test videos I tried.

the big buck bunny is a fresh straight from ffmpeg basic libx264 encode too so there should be 0 issues here

BuyMyMojo commented 3 weeks ago

never mind, I just saw your screenshot has that too...

I just tried with a x265 encoded video too and still no luck, this is very strange.

I'll try again with a fresh docker build

BuyMyMojo commented 3 weeks ago

Can confirm this is an issue specifically with realesr specifically, I got an output with libplacebo.

https://github.com/user-attachments/assets/9b1a18ca-bb2c-407d-ae6b-ca5122da2e4b

BuyMyMojo commented 3 weeks ago

Did a test with my other card, it seems to be an issue with AMD and realesr.

It runs find if not significantly slower on my Arc A770(0.95 FPS vs 4.something FPS with the AMD card)

https://github.com/user-attachments/assets/4eba001a-a97d-4b36-8bda-e6f1fce2c6f3

k4yt3x commented 3 weeks ago

Since I ran that same video in my environment without issues, I don't think the video is the problem here. I wonder if it's related to how the GPU is passed into the container. Maybe something is missing.

k4yt3x commented 3 weeks ago

Ah, I reviewed your command, try adding --gpus all. https://github.com/K4YT3X/video2x/wiki/Container#amd-gpus

BuyMyMojo commented 3 weeks ago

adding that gives me the same error as this old docker issue https://github.com/docker/cli/issues/2063

They mention cdi devices here, I'll try enabling this and seeing if there is a change https://github.com/docker/cli/issues/2063#issuecomment-2150477271

BuyMyMojo commented 3 weeks ago

cdi devices doesn't seem to actually be something I can figure out myself but it does seem like what most people recommend is the way of mounting the gpu I've already been doing, again it works fine for Anime4K/Libplacebo just not for realesr for some reason.

with a quick look around it seems this issue can occur when it isn't able to find the model correctly so I wonder if it just isn't getting copied properly when I build the container

k4yt3x commented 3 weeks ago

Nah with 6.0.0 if it can't find the model it will just throw an error and stop. I think it's with how the GPU is passed still. I'm fighting with OpenCV for now... I'll take another look at this once I'm done.

k4yt3x commented 2 weeks ago

I've asked around and test on my other devices and haven't been able to reproduce the issue on any NVIDIA or AMD cards earlier than RDNA3. However, someone on my Telegram channel pointed out that it might cause by ncnn's incompatibility with WMMA introduced in RDNA3. They were able to reproduce it on a 780M, with both video2x and -ncnn-vulkan builds. However, if they were to use earlier static builds of the same -ncnn-vulkan tools, the output is fine.

What could be causing the issue here is that the current ncnn is incompatible with WMMA in RDNA3 and later architectures. I'll need more cases and tests to confirm this, and it would be a bug ncnn needs to fix if it were to be true.

k4yt3x commented 2 days ago

I had some friends help me test this. It seems like it's a bug that's triggered with RADV + ncnn. You can use AMDVLK to bypass this issue for now. It's up to RADV to fix it.

BuyMyMojo commented 1 day ago

I had some friends help me test this. It seems like it's a bug that's triggered with RADV + ncnn. You can use AMDVLK to bypass this issue for now. It's up to RADV to fix it.

I appreciate your tasting! Did you spot of an existing issue on the RADV repo during your research or should I go open one?

k4yt3x commented 1 day ago

This was forwarded to me:

https://gitlab.freedesktop.org/mesa/mesa/-/issues/10847

I'm not 100% sure this is the exact issue, but it's definitely either with ncnn or RADV.