rigaya / NVEnc

NVENCによる高速エンコードの性能実験
https://rigaya34589.blog.fc2.com/blog-category-17.html
Other
1.1k stars 114 forks source link

Confusion and potential bug in new --tune parameter #583

Closed Dendraspis closed 7 months ago

Dendraspis commented 7 months ago

Hello,

it seems there are two issues with the new --tune parameter.

  1. Whereas --lossless is limited to [H.264/HEVC], there is no limitation noted for --tune lossless. It would be nice to add that to the docs as --tune lossless seems to be an alternative to --lossless?
  2. Given the current and official NVEncC v7.50 version...
    NVEncC (x64) 7.50 (r2803) by rigaya, Apr 13 2024 09:00:50 (VC 1929/Win)
    [NVENC API v12.2, CUDA 10.1]
    reader: raw, y4m, avi, avs, vpy, avsw, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4, AV1]

    $~$ ...you get an error message when using --tune lossless on HEVC:

    
    [..]\NVEncC64.exe --avsdll [..]\AviSynth.dll --codec h265 --tune lossless --colormatrix bt709 --colorprim bt709 --transfer bt709 -i A:\_StaxRip-Test\YXY_temp\YXY_NVEncC-Lossless.vpy -o A:\_StaxRip-Test\YXY_temp\YXY_NVEncC-Lossless_out.h265

nvenc : Failed to Initialize the encoder nvenc : .: 8 (NVENC indicates that one or more of the parameter passed to the API call is invalid.)

[..] returned exit code: 1 (0x1)

$~$
When you add `--lossless` while keeping `--tune lossless`, the encoder starts running without error messages:

[..]\NVEncC64.exe --avsdll [..]\AviSynth.dll --codec h265 --tune lossless --colormatrix bt709 --colorprim bt709 --transfer bt709 --lossless -i A:_StaxRip-Test\YXY_temp\YXY_NVEncC-Lossless.vpy -o A:_StaxRip-Test\YXY_temp\YXY_NVEncC-Lossless_out.h265


$~$  
Alternative call:
When you run the encode with `--codec av1 -tune lossless --lossless` you get a warning, that your GPU does not support lossless AV1 encoding - at least I am  😅 . But when you remove the `--lossless` parameter, you get the same error message as shown above.
rigaya commented 7 months ago

Actually, you should always use simply --lossless for lossless encoding, and never --tune lossless. --tune is mainly added to enable "uhq" for HEVC added in SDK 12.2, and others are simply just listed.

As lossless encoding requires setup for other parameters which is done by --lossless, using only --tune lossless will skip all the checks and settings required, so it will result unexpected results as you have mentioned,

Dendraspis commented 7 months ago

Okay, I really didn't expect that. What I did expect is that --tune lossless also sets --lossless and --tune lowlatency also sets --lowlatency.

Just my thoughts: It seems extremely confusing to have two parameters, that seems to do the same, whereas one alone can lead to a crash. Regarding lossless and lowlatency it sounds like you set the tune, but don't get what you would expect, instead you have to set the corresponding parameter --lossless / --lowlatency additionally. In case of --tune lossless you can't even set it without --lossless. 😕 I don't know the internals behind those parameters, but maybe you should consider merging them, except there is a good reason to separate them.

--tune is mainly added to enable "uhq" for HEVC added in SDK 12.2, and others are simply just listed.

So does that mean that --tune uhq is the only one that changes anything? All other parameter values are just "fake"?

Einst1969 commented 7 months ago

Hi rigaya, I was just trying out the new "tune uhq" command. I did some coding with and without the parameter but the result is the same, as if it didn't work. Maybe I'm missing how it should be used and when it should be used. Can you help me understand?

Einst1969 commented 7 months ago

the command I use (via staxrip) is

C:\Users\fra\Downloads\StaxRip-v2.38.6-x64\Apps\Encoders\NVEncC\NVEncC64.exe --avsdll C:\Users\fra\Downloads\StaxRip- v2.38.6-x64\Apps\FrameServer\AviSynth\AviSynth.dll --qvbr 28 --codec h265 -i "C:\Users\fra\Videos\crowd_run_1080p50 loss_temp\crowd_run_1080p50 loss_new.avs" -o "C:\Users \fra\Videos\crowd_run_1080p50 loss_temp\crowd_run_1080p50 loss_new_out.h265"

and then

C:\Users\fra\Downloads\StaxRip-v2.38.6-x64\Apps\Encoders\NVEncC\NVEncC64.exe --avsdll C:\Users\fra \Downloads\StaxRip-v2.38.6-x64\Apps\FrameServer\AviSynth\AviSynth.dll --qvbr 28 --codec h265 --tune uhq -i "C:\Users\fra\Videos\crowd_run_1080p50 loss_temp\crowd_run_1080p50 loss_new. avs" -o "C:\Users\fra\Videos\crowd_run_1080p50 loss_temp\crowd_run_1080p50 loss_new_out.h265"

rigaya commented 7 months ago

I've decided to remove --tune from NVEnc 7.51, as it seems to be confusing, and they can or they should be used from other options.

As it is written in the NVIDIA documents as "UHQ Tuning Info enables lookahead and temporal Filter, that have higher memory requirements.", it seems to be combination of lookahead and tf-level. Therefore --tune uhq shall can be achieved by -b 4 --tf-level <int> --lookahead <int> --lookahead-level <int>.

Also, now it is quite understandable that it reuiqres Turing+ GPU, as tf-level requires bframes >=4 and only Turing+ GPU supports HEVC B frames.

For other --tune parameters, please check the release note of NVEnc 7.51.

Dendraspis commented 7 months ago

Thanks for clearing that up - I would say it is easier to use now. 👍

L4cache commented 4 months ago

I've decided to remove --tune from NVEnc 7.51, as it seems to be confusing, and they can or they should be used from other options.

As it is written in the NVIDIA documents as "UHQ Tuning Info enables lookahead and temporal Filter, that have higher memory requirements.", it seems to be combination of lookahead and tf-level. Therefore --tune uhq shall can be achieved by -b 4 --tf-level <int> --lookahead <int> --lookahead-level <int>.

Also, now it is quite understandable that it reuiqres Turing+ GPU, as tf-level requires bframes >=4 and only Turing+ GPU supports HEVC B frames.

For other --tune parameters, please check the release note of NVEnc 7.51.

I think they use 5 b-frames based on inspection of produced bitstream (encoded by FFmpeg), and lookahead is 25 based on verbose logging output of FFmpeg, lookahead-level should be 1, it's the current maximum. However, the exact same output and decrease of encode speed are not achieved after setting these parameters. The quality is, well, close enough I guess. FFmpeg is used for the experiments because I found it seems that nvencc's uhq tune does not work, like Einst1969 stated.