rigaya / NVEnc

NVENCによる高速エンコードの性能実験
https://rigaya34589.blog.fc2.com/blog-category-17.html
Other
1.03k stars 108 forks source link

y4m, resolution problem #572

Closed Selur closed 2 months ago

Selur commented 3 months ago

calling:

ffmpeg -y -loglevel fatal -noautorotate -nostdin -threads 8 -ignore_editlist true -i "C:\Users\Selur\Desktop\source.mp4" -map 0:0 -an -sn -color_primaries bt709 -color_trc bt709 -colorspace bt709 -color_range tv  -pix_fmt yuv420p10le -strict -1 -vsync 0 -f yuv4mpegpipe - | NVEncC --y4m -i - --fps 23.976 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt709 --cuda-schedule sync --output "J:\tmp\source_1_2024-03-21@11_53_56_9710_01.av1"

I see:

--------------------------------------------------------------------------------
J:\tmp\source_1_2024-03-21@11_53_56_9710_01.av1
--------------------------------------------------------------------------------
NVEncC (x64) 7.46 (r2779) by rigaya, Mar 13 2024 12:17:47 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.86]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     y4m(yv12(10bit))->p010 [AVX2], 1000x542, 24000/1001 fps
Vpp Filters    copyHtoD
Output Info    AV1 main 10bit @ Level auto
               1000x542p 1:1 23.976fps (24000/1001fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     240 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt709,range:limited
Others         mv:Q-pel

NVEnC then stops without an error and a 0-byte file is created.

using another file:

ffmpeg -y -loglevel fatal -noautorotate -nostdin -threads 8 -i "G:\TestClips&Co\files\10bit Test.mkv" -map 0:0 -an -sn -color_primaries bt709 -color_trc bt709 -colorspace bt709 -color_range tv  -pix_fmt yuv420p10le -strict -1 -vsync 0 -f yuv4mpegpipe - | NVEncC --y4m -i - --fps 23.976 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt709 --cuda-schedule sync --output "J:\tmp\source_1_2024-03-21@11_53_56_9710_01.av1"

the output is created without a problem:

--------------------------------------------------------------------------------
J:\tmp\source_1_2024-03-21@11_53_56_9710_01.av1
--------------------------------------------------------------------------------
NVEncC (x64) 7.46 (r2779) by rigaya, Mar 13 2024 12:17:47 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.77GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.86]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     y4m(yv12(10bit))->p010 [AVX2], 640x352, 24000/1001 fps
Vpp Filters    copyHtoD
Output Info    AV1 main 10bit @ Level auto
               640x352p 1:1 23.976fps (24000/1001fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     240 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt709,range:limited
Others         mv:Q-pel

encoded 429 frames, 1722.89 fps, 610.02 kbps, 1.30 MB
encode time 0:00:00, CPULoad: 2.4%
frame type IDR   2
frame type I     2,  total size  0.02 MB
frame type P   107,  total size  0.00 MB
frame type B   320,  total size  1.28 MB

After some testing, the problem seems to be the resolution! Taking the file that worked fine as input and resizing it to 1000x542, I get the same problem:

ffmpeg -y -loglevel fatal -noautorotate -nostdin -threads 8 -i "G:\TestClips&Co\files\10bit Test.mkv" -map 0:0 -an -sn -vf scale=1000:542 -color_primaries bt709 -color_trc bt709 -colorspace bt709 -color_range tv -pix_fmt yuv420p10le -strict -1 -vsync 0  -sws_flags spline -f yuv4mpegpipe - | NVEncC --y4m -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\10bit Test_1_2024-03-21@12_03_03_9410_01.av1"

only a 0-byte is created

--------------------------------------------------------------------------------
J:\tmp\10bit Test_1_2024-03-21@12_03_03_9410_01.av1
--------------------------------------------------------------------------------
NVEncC (x64) 7.46 (r2779) by rigaya, Mar 13 2024 12:17:47 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.86]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     y4m(yv12(10bit))->p010 [AVX2], 1000x542, 25/1 fps
Vpp Filters    copyHtoD
Output Info    AV1 main 10bit @ Level auto
               1000x542p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     250 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt470bg,range:limited
Others         mv:Q-pel

Using different encoders av1/h264/h265 I get the same issue.

Taking the file which caused the problem and using avhw:

NVEncC --avhw  -i "C:\Users\Selur\Desktop\source.mp4" --fps 23.976 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt709 --vpp-resize auto --output-res 1000x542 --vpp-gauss disabled --cuda-schedule sync --output "J:\tmp\source_1_2024-03-21@12_08_10_2610_01.av1"

encoding works fine.

--------------------------------------------------------------------------------
J:\tmp\source_1_2024-03-21@12_08_10_2610_01.av1
--------------------------------------------------------------------------------
avcuvid: Unknown input option: framerate=24000/1001, ignored.
NVEncC (x64) 7.46 (r2779) by rigaya, Mar 13 2024 12:17:47 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.86]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: H.265/HEVC, 1000x542, 24000/1001 fps
Vpp Filters    cspconv(nv12 -> p010)
Output Info    AV1 main 10bit @ Level auto
               1000x542p 1:1 23.976fps (24000/1001fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     240 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt709,range:limited
Others         mv:Q-pel

encoded 746 frames, 1055.16 fps, 1894.16 kbps, 7.03 MB
encode time 0:00:00, CPULoad: 1.0%
frame type IDR   4
frame type I     4,  total size  0.14 MB
frame type P   187,  total size  0.02 MB
frame type B   555,  total size  6.87 MB

=> It seems like there is some problem with the y4m parsing depending on the resolution.

Cu Selur

Ps.: attached the source.mp4: https://github.com/rigaya/NVEnc/assets/843640/05646b43-1c09-4ec7-9bab-8cc6eff73862

rigaya commented 3 months ago

Thank you for the detailed report. Seems to be trouble with irregular resolution which cannot be divided by 16.

Shall be fixed with NVEnc 7.47.

rigaya commented 2 months ago

I'll close the issue as the issue should have been fixed.