Closed johnvick closed 4 years ago
I think it is because the encoder preset has been changed due to update of NVENC API version. NVEnc supports the latest NVENC API 10 from version 5.10.
The encoder preset has been changed from 3 level (performance, default, quality) in previous NVENC API 9.1 to 7 levels (P1 ~ P7) in NVENC API 10. They do not always correspond to each other. Thus there will be a change in encode performance even if you use the same options. Please refer to p.20 - p.22 of the pdf below which explain this further.
To adjust the encode performance, you can change the encoder preset by --preset option.
Thanks for the prompt reply and for your work on this it is appreciated. The settings I use in Staxrip generate this command line:
--vbrhq 2000 --codec h265 --preset quality --profile main10 --output-depth 10 --vpp-edgelevel strength=10,threshold=15,black=5,white=2 --gop-len 240 --lookahead 16 --slices 2 --strict-gop --nonrefp --cuda-schedule auto --colormatrix bt709 --colorprim bt709 --transfer bt709
What would I need to change to get similar results and encoding speeds with your latest version?
I'm also noticing a similar drop in performance with HEVC and VBR. According to the graph on page 20, the fps drop from the old HQ to P7 could be as much as 10% but I'm seeing a drop of ~25%. The graph also says P6 should give speed parity to the old HQ setting although it's still ~25% slower for me.
I don't mind if there are tangible benefits to quality or something, but it's a big drop if there is nothing to show for it.
I have just updated Staxrip to 2.1.3.0 and fps with 5.12 is back up to 160 with no changes to setting so problem resolved as far as I can see.
I have just updated Staxrip to 2.1.3.0 and fps with 5.12 is back up to 160 with no changes to setting so problem resolved as far as I can see.
Does your Staxrip command line output look exactly the same after the update?
Ah - the template somehow changed when loaded into the new Staxrip. Didn't spot that. I have fixed some of the changes and am now getting 120 fps. I'll look into it more tomorrow.
I have retested with identical command lines and the fps is now 120 compared to the old 160. Staxrip only offers the three levels (performance, default, quality). Maybe it's a case of waiting until Staxrip is updated to offer the new P1-P7 levels in API10 or else manually changing the command line it generates.
I have retested with identical command lines and the fps is now 120 compared to the old 160. Staxrip only offers the three levels (performance, default, quality). Maybe it's a case of waiting until Staxrip is updated to offer the new P1-P7 levels in API10 or else manually changing the command line it generates.
I'm using this tool without Staxrip and it does the same thing. According to the NVEncC docs, the old quality setting is the same as using P7 in the new API, but P7 and P6 both drop the fps to ~120. Trying P5 doubles the fps to ~240.
I've just posted on the Staxrip Github page to ask if they will be updating to accommodate the new levels.
I'll close this issue, as the new preset support has come in StaxRip.
I don't think this issue has been resolved as StaxRip wasn't the cause.
The problem is the new Best Quality setting in NVEncC (7) is 30% slower than the old NVEncC Best Quality setting.
According to the nVidia docs, the old HQ is the same as P6 in terms of speed and compression, but in testing, P6 is 15-20% slower in NVEncC even though they output the same file size.
I don't see problem here, I can get the result on the doc as below. By testing with pure HEVC VBR without any other encoding options, we shall be able to reproduce the result of the NVIDIA doc.
In that case, we can get same performance, bitrate and ssim by "old quality" and P6.
HEVC VBR: Old quality (API v9)
Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --vbr 6000 --output-res 1920x1080 -u quality --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul 1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset quality
Rate Control VBR
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto
encoded 3501 frames, 166.33 fps, 5383.98 kbps, 74.90 MB
encode time 0:00:21, CPU: 3.1%, GPU: 6.3%, VE: 94.3%, VD: 20.6%, GPUClock: 1895MHz, VEClock: 1761MHz
frame type IDR 12
frame type I 12, avgQP 15.92, total size 1.12 MB
frame type P 875, avgQP 16.03, total size 44.46 MB
frame type B 2614, avgQP 21.34, total size 29.32 MB
ssim/psnr: SSIM YUV: 0.994445 (22.553199), 0.994991 (23.002362), 0.994453 (22.559126), All: 0.994537 (22.625911),
(Frames: 3501)
HEVC VBR: New P6 (API v10)
Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --vbr 6000 --output-res 1920x1080 -u P6 --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset P6
Rate Control VBR
Multipass none
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto repeat-headers
encoded 3501 frames, 165.81 fps, 5384.05 kbps, 74.90 MB
encode time 0:00:21, CPU: 3.1%, GPU: 6.2%, VE: 94.7%, VD: 20.7%, GPUClock: 1886MHz, VEClock: 1752MHz
frame type IDR 12
frame type I 12, avgQP 15.92, total size 1.12 MB
frame type P 875, avgQP 16.03, total size 44.46 MB
frame type B 2614, avgQP 21.34, total size 29.32 MB
ssim/psnr: SSIM YUV: 0.994445 (22.553199), 0.994991 (23.002362), 0.994453 (22.559126), All: 0.994537 (22.625911),
(Frames: 3501)
Furthermore, we can get same performance, bitrate and ssim by "old default" and P5, which is also written in the doc.
HEVC VBR: Old default (API v9)
Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --vbr 6000 --output-res 1920x1080 -u default --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul 1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset default
Rate Control VBR
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto
encoded 3501 frames, 295.59 fps, 5387.74 kbps, 74.95 MB
encode time 0:00:11, CPU: 3.4%, GPU: 10.9%, VE: 90.8%, VD: 36.0%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR 12
frame type I 12, avgQP 15.92, total size 1.16 MB
frame type P 875, avgQP 16.07, total size 44.72 MB
frame type B 2614, avgQP 21.41, total size 29.07 MB
ssim/psnr: SSIM YUV: 0.994426 (22.537960), 0.994915 (22.937011), 0.994367 (22.492613), All: 0.994497 (22.594271),
(Frames: 3501)
HEVC VBR: New P5 (API v10)
Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --vbr 6000 --output-res 1920x1080 -u P5 --ssim
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset P5
Rate Control VBR
Multipass none
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto repeat-headers
encoded 3501 frames, 291.02 fps, 5387.81 kbps, 74.95 MB
encode time 0:00:12, CPU: 3.4%, GPU: 11.4%, VE: 90.6%, VD: 36.5%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR 12
frame type I 12, avgQP 15.92, total size 1.16 MB
frame type P 875, avgQP 16.07, total size 44.72 MB
frame type B 2614, avgQP 21.41, total size 29.07 MB
ssim/psnr: SSIM YUV: 0.994426 (22.537960), 0.994915 (22.937011), 0.994367 (22.492613), All: 0.994497 (22.594271),
(Frames: 3501)
However, by adding other encoding options, the result will differ. For example, the preset behavior seems to differ when we use vbrhq/multipass, and "old quality" and "new P6" is not equivalent in that case. It will be difficult to know what is the equivalent settings in this case, as the details of the new and old preset is unknown.
Anyway, I think the implementation around the preset is properly done, as I was able to reproduce the result of the doc.
Apologies, I'm not trying to cause more work or problems for you. ;)
Think I may have an idea of the problem, I have been using "--vbrhq 0" in NVEncC and it has obviously changed since the new API. "--vbrhq 0" is now "--vbr 0 --multipass 2pass-full" but if I use "--vbr 0 --multipass 2pass-quarter", it has about the same speed as the old "--vbrhq 0".
So, was the old vbrhq 0 using the equivalent of quarter and the new one uses full by default? If so, that would explain the speed drop.
Seems like you can get the same result (bitrate, performance and ssim) by "Old VBRHQ default" and "VBR P5 2pass-quarter" as you have said.
API v9: VBRHQ default
Y:\QSVTest>x64\NVEncC64_5.08.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --ssim --output-res 1920x1080 --vbrhq 6000
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.08 (r1585) by rigaya, Jul 1 2020 22:41:25 (VC 1926/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 9.1, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset default
Rate Control VBRHQ
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto
encoded 3501 frames, 289.36 fps, 5379.88 kbps, 74.84 MB
encode time 0:00:12, CPU: 3.1%, GPU: 11.3%, VE: 98.0%, VD: 36.4%, GPUClock: 1858MHz, VEClock: 1725MHz
frame type IDR 12
frame type I 12, avgQP 15.33, total size 1.22 MB
frame type P 875, avgQP 15.97, total size 44.43 MB
frame type B 2614, avgQP 21.37, total size 29.20 MB
ssim/psnr: SSIM YUV: 0.994431 (22.541940), 0.994903 (22.926943), 0.994357 (22.484768), All: 0.994497 (22.594065),
(Frames: 3501)
API v10: VBR P5 2pass-quarter
Y:\QSVTest>x64\NVEncC64.exe -i sakura_op.mpg -o F:\temp\test.mp4
-c hevc --ssim --vbr 6000 --output-res 1920x1080 --multipass 2pass-quarter --preset P5
--------------------------------------------------------------------------------
F:\temp\test.mp4
--------------------------------------------------------------------------------
NVEncC (x64) 5.15 (r1658) by rigaya, Sep 12 2020 23:40:28 (VC 1927/Win/avx2)
OS Version Windows 10 x64 (18363)
CPU Intel Core i9-7980XE @ 2.60GHz [TB: 4.10GHz] (18C/36T)
GPU #0: GeForce RTX 2070 (2304 cores, 1710 MHz)[PCIe3x16][451.67]
NVENC / CUDA NVENC API 10.0, CUDA 11.0, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: MPEG1, 1280x720, 30/1 fps
Vpp Filters copyDtoD
ssim (yv12)
Output Info H.265/HEVC main @ Level auto
1920x1080p 1:1 30.000fps (30/1fps)
avwriter: hevc => mp4
Encoder Preset P5
Rate Control VBR
Multipass 2pass-quarter
Bitrate 6000 kbps (Max: 11520 kbps)
Target Quality auto
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead off
GOP length 300 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
CU max / min auto / auto
Others mv:auto repeat-headers
encoded 3501 frames, 289.63 fps, 5379.95 kbps, 74.84 MB
encode time 0:00:12, CPU: 3.1%, GPU: 10.5%, VE: 91.5%, VD: 34.0%, GPUClock: 1860MHz, VEClock: 1727MHz
frame type IDR 12
frame type I 12, avgQP 15.33, total size 1.22 MB
frame type P 875, avgQP 15.97, total size 44.43 MB
frame type B 2614, avgQP 21.37, total size 29.20 MB
ssim/psnr: SSIM YUV: 0.994431 (22.541940), 0.994903 (22.926943), 0.994357 (22.484768), All: 0.994497 (22.594065),
(Frames: 3501)
Thanks for confirming, I guess the speed drop mystery is now solved...:)
I use NVEnc with Staxrip to encode 1080 TV rips - I have a GeForce GTX 1660 with the latest drivers as of today. With 5.12 I noticed reduction in encoding framerates from 160 fps to 120 fps approx.