rigaya / NVEnc

NVENCによる高速エンコードの性能実験
https://rigaya34589.blog.fc2.com/blog-category-17.html
Other
1.09k stars 114 forks source link

5.12 reduction in fps #257

Closed johnvick closed 4 years ago

johnvick commented 4 years ago

I use NVEnc with Staxrip to encode 1080 TV rips - I have a GeForce GTX 1660 with the latest drivers as of today. With 5.12 I noticed reduction in encoding framerates from 160 fps to 120 fps approx.

rigaya commented 4 years ago

I think it is because the encoder preset has been changed due to update of NVENC API version. NVEnc supports the latest NVENC API 10 from version 5.10.

The encoder preset has been changed from 3 level (performance, default, quality) in previous NVENC API 9.1 to 7 levels (P1 ~ P7) in NVENC API 10. They do not always correspond to each other. Thus there will be a change in encode performance even if you use the same options. Please refer to p.20 - p.22 of the pdf below which explain this further.

http://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21337-nvidia-video-technologies-video-codec-and-optical-flow-sdk.pdf

To adjust the encode performance, you can change the encoder preset by --preset option.

johnvick commented 4 years ago

Thanks for the prompt reply and for your work on this it is appreciated. The settings I use in Staxrip generate this command line:

--vbrhq 2000 --codec h265 --preset quality --profile main10 --output-depth 10 --vpp-edgelevel strength=10,threshold=15,black=5,white=2 --gop-len 240 --lookahead 16 --slices 2 --strict-gop --nonrefp --cuda-schedule auto --colormatrix bt709 --colorprim bt709 --transfer bt709

What would I need to change to get similar results and encoding speeds with your latest version?

RCH-9 commented 4 years ago

I'm also noticing a similar drop in performance with HEVC and VBR. According to the graph on page 20, the fps drop from the old HQ to P7 could be as much as 10% but I'm seeing a drop of ~25%. The graph also says P6 should give speed parity to the old HQ setting although it's still ~25% slower for me.

I don't mind if there are tangible benefits to quality or something, but it's a big drop if there is nothing to show for it.

johnvick commented 4 years ago

I have just updated Staxrip to 2.1.3.0 and fps with 5.12 is back up to 160 with no changes to setting so problem resolved as far as I can see.

RCH-9 commented 4 years ago

I have just updated Staxrip to 2.1.3.0 and fps with 5.12 is back up to 160 with no changes to setting so problem resolved as far as I can see.

Does your Staxrip command line output look exactly the same after the update?

johnvick commented 4 years ago

Ah - the template somehow changed when loaded into the new Staxrip. Didn't spot that. I have fixed some of the changes and am now getting 120 fps. I'll look into it more tomorrow.

johnvick commented 4 years ago

I have retested with identical command lines and the fps is now 120 compared to the old 160. Staxrip only offers the three levels (performance, default, quality). Maybe it's a case of waiting until Staxrip is updated to offer the new P1-P7 levels in API10 or else manually changing the command line it generates.

RCH-9 commented 4 years ago

I have retested with identical command lines and the fps is now 120 compared to the old 160. Staxrip only offers the three levels (performance, default, quality). Maybe it's a case of waiting until Staxrip is updated to offer the new P1-P7 levels in API10 or else manually changing the command line it generates.

I'm using this tool without Staxrip and it does the same thing. According to the NVEncC docs, the old quality setting is the same as using P7 in the new API, but P7 and P6 both drop the fps to ~120. Trying P5 doubles the fps to ~240.

johnvick commented 4 years ago

I've just posted on the Staxrip Github page to ask if they will be updating to accommodate the new levels.

rigaya commented 4 years ago

I'll close this issue, as the new preset support has come in StaxRip.

RCH-9 commented 4 years ago

I don't think this issue has been resolved as StaxRip wasn't the cause.

The problem is the new Best Quality setting in NVEncC (7) is 30% slower than the old NVEncC Best Quality setting.

According to the nVidia docs, the old HQ is the same as P6 in terms of speed and compression, but in testing, P6 is 15-20% slower in NVEncC even though they output the same file size.

rigaya commented 4 years ago

I don't see problem here, I can get the result on the doc as below. By testing with pure HEVC VBR without any other encoding options, we shall be able to reproduce the result of the NVIDIA doc.

In that case, we can get same performance, bitrate and ssim by "old quality" and P6.

HEVC VBR: Old quality (API v9)

Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
  -c hevc --vbr 6000 --output-res 1920x1080 -u quality --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul  1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset quality
Rate Control   VBR
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto

encoded 3501 frames, 166.33 fps, 5383.98 kbps, 74.90 MB
encode time 0:00:21, CPU: 3.1%, GPU: 6.3%, VE: 94.3%, VD: 20.6%, GPUClock: 1895MHz, VEClock: 1761MHz
frame type IDR   12
frame type I     12,  avgQP  15.92,  total size   1.12 MB
frame type P    875,  avgQP  16.03,  total size  44.46 MB
frame type B   2614,  avgQP  21.34,  total size  29.32 MB
ssim/psnr: SSIM YUV: 0.994445 (22.553199), 0.994991 (23.002362), 0.994453 (22.559126), All: 0.994537 (22.625911), 
(Frames: 3501)

HEVC VBR: New P6 (API v10)

Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4
 -c hevc --vbr 6000 --output-res 1920x1080 -u P6 --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset P6
Rate Control   VBR
Multipass      none
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto repeat-headers

encoded 3501 frames, 165.81 fps, 5384.05 kbps, 74.90 MB
encode time 0:00:21, CPU: 3.1%, GPU: 6.2%, VE: 94.7%, VD: 20.7%, GPUClock: 1886MHz, VEClock: 1752MHz
frame type IDR   12
frame type I     12,  avgQP  15.92,  total size   1.12 MB
frame type P    875,  avgQP  16.03,  total size  44.46 MB
frame type B   2614,  avgQP  21.34,  total size  29.32 MB
ssim/psnr: SSIM YUV: 0.994445 (22.553199), 0.994991 (23.002362), 0.994453 (22.559126), All: 0.994537 (22.625911), 
(Frames: 3501)

Furthermore, we can get same performance, bitrate and ssim by "old default" and P5, which is also written in the doc.

HEVC VBR: Old default (API v9)

Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
  -c hevc --vbr 6000 --output-res 1920x1080 -u default --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul  1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset default
Rate Control   VBR
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto

encoded 3501 frames, 295.59 fps, 5387.74 kbps, 74.95 MB
encode time 0:00:11, CPU: 3.4%, GPU: 10.9%, VE: 90.8%, VD: 36.0%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR   12
frame type I     12,  avgQP  15.92,  total size   1.16 MB
frame type P    875,  avgQP  16.07,  total size  44.72 MB
frame type B   2614,  avgQP  21.41,  total size  29.07 MB
ssim/psnr: SSIM YUV: 0.994426 (22.537960), 0.994915 (22.937011), 0.994367 (22.492613), All: 0.994497 (22.594271),
(Frames: 3501)

HEVC VBR: New P5 (API v10)

Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4 
 -c hevc --vbr 6000 --output-res 1920x1080 -u P5 --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset P5
Rate Control   VBR
Multipass      none
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto repeat-headers

encoded 3501 frames, 291.02 fps, 5387.81 kbps, 74.95 MB
encode time 0:00:12, CPU: 3.4%, GPU: 11.4%, VE: 90.6%, VD: 36.5%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR   12
frame type I     12,  avgQP  15.92,  total size   1.16 MB
frame type P    875,  avgQP  16.07,  total size  44.72 MB
frame type B   2614,  avgQP  21.41,  total size  29.07 MB
ssim/psnr: SSIM YUV: 0.994426 (22.537960), 0.994915 (22.937011), 0.994367 (22.492613), All: 0.994497 (22.594271),
(Frames: 3501)

However, by adding other encoding options, the result will differ. For example, the preset behavior seems to differ when we use vbrhq/multipass, and "old quality" and "new P6" is not equivalent in that case. It will be difficult to know what is the equivalent settings in this case, as the details of the new and old preset is unknown.

Anyway, I think the implementation around the preset is properly done, as I was able to reproduce the result of the doc.

RCH-9 commented 4 years ago

Apologies, I'm not trying to cause more work or problems for you. ;)

Think I may have an idea of the problem, I have been using "--vbrhq 0" in NVEncC and it has obviously changed since the new API. "--vbrhq 0" is now "--vbr 0 --multipass 2pass-full" but if I use "--vbr 0 --multipass 2pass-quarter", it has about the same speed as the old "--vbrhq 0".

So, was the old vbrhq 0 using the equivalent of quarter and the new one uses full by default? If so, that would explain the speed drop.

rigaya commented 4 years ago

Seems like you can get the same result (bitrate, performance and ssim) by "Old VBRHQ default" and "VBR P5 2pass-quarter" as you have said.

API v9: VBRHQ default

Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --ssim --output-res 1920x1080 --vbrhq 6000
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul  1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset default
Rate Control   VBRHQ
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto

encoded 3501 frames, 289.36 fps, 5379.88 kbps, 74.84 MB
encode time 0:00:12, CPU: 3.1%, GPU: 11.3%, VE: 98.0%, VD: 36.4%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR   12
frame type I     12,  avgQP  15.33,  total size   1.22 MB
frame type P    875,  avgQP  15.97,  total size  44.43 MB
frame type B   2614,  avgQP  21.37,  total size  29.20 MB
ssim/psnr: SSIM YUV: 0.994431 (22.541940), 0.994903 (22.926943), 0.994357 (22.484768), All: 0.994497 (22.594065), 
(Frames: 3501)

API v10: VBR P5 2pass-quarter

Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4
 -c hevc --ssim --vbr 6000 --output-res 1920x1080 --multipass 2pass-quarter --preset P5
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version     Windows 10 x64 (18363)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA   NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters    copyDtoD
               ssim (yv12)
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 30.000fps (30/1fps)
               avwriter: hevc => mp4
Encoder Preset P5
Rate Control   VBR
Multipass      2pass-quarter
Bitrate        6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      off
GOP length     300 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
Others         mv:auto repeat-headers

encoded 3501 frames, 289.63 fps, 5379.95 kbps, 74.84 MB
encode time 0:00:12, CPU: 3.1%, GPU: 10.5%, VE: 91.5%, VD: 34.0%, GPUClock: 1860MHz, VEClock: 1727MHz
frame type IDR   12
frame type I     12,  avgQP  15.33,  total size   1.22 MB
frame type P    875,  avgQP  15.97,  total size  44.43 MB
frame type B   2614,  avgQP  21.37,  total size  29.20 MB
ssim/psnr: SSIM YUV: 0.994431 (22.541940), 0.994903 (22.926943), 0.994357 (22.484768), All: 0.994497 (22.594065),
(Frames: 3501)
RCH-9 commented 4 years ago

Thanks for confirming, I guess the speed drop mystery is now solved...:)