rigaya / NVEnc

NVENCによる高速エンコードの性能実験
https://rigaya34589.blog.fc2.com/blog-category-17.html
Other
1.09k stars 114 forks source link

HEVC Nvenc low conversion FPS #88

Closed parhamsan closed 3 months ago

parhamsan commented 6 years ago

Hey, I am having low FPS when using an RTX2080Ti for the Nvidia H.265 10bit conversion.

Here is my situation:

I used to have a GTX 1080Ti and it worked perfectly with Staxrip 1.7 and i was getting around 430-440 FPS when converting a 1080p file x264 to 10bit x265. I sold my GPU and for a while i was using a GTX 1050Ti. For the 1050Ti 4GB i was getting around 250-270 FPS which was the FPS I suspected to get. Recently I have installed a RTX 2080Ti and with the 2080Ti i am getting 170-180 FPS which is very low considering RTX2080Ti has more CUDA cores and has the same 11GB memory (well GDDR5x vs GDDR6). The only thing that have changed from 1080Ti to 2080Ti is the Nvidia driver that i have used and the version of NVenc (latest one i am using is 4.22). Other system parameters are intel 7700K cpu and 16GB 3200Mhz which have not changed.

I have tried using the AVSMeter i am getting an average of 430FPS encoding the same mkv video file. I was wondering if anyone has the same issue.

Thanks.

parhamsan commented 6 years ago

Does anyone have a GPU from RTX family that can share their conversion FPS for NVIDIA H.265 10bit CBR with us? Thanks.

rigaya commented 6 years ago

I can't see slow down in RTX2070...

Encode Speed (fps)

GTX1060 GTX1080 RTX2070
OS Win10 x64 Win10 x64 Win10 x64
CPU i9 7980XE R7 1700 i9 7980XE
GPU Driver 416.34 416.16 416.34
CQP H.264 311.69 326.82 448.16
HEVC 303.88 313.95 328.20
HEVC 10bit 294.21 305.85 324.95
HEVC + B frame 285.35
HEVC 10bit + B frame 282.37
VBRQ H.264 287.55 320.31 340.69
HEVC 246.2 252.25 277.44
HEVC 10bit 230.76 240.15 272.59
HEVC + B frame 262.22
HEVC 10bit + B frame 260.53
parhamsan commented 6 years ago

Hey, thanks for the reply. Would you be able to share the file so i can do the tests?

Also what ver of staxrip and NVenc are you using? I will do some tests with your parameters and post the results.

My specs are: i7 7700k 4.2GHz, 16GB DDR4 3200, Zotac RTX 2080Ti Amp, Win 10 x64, Nvidia Driver 416.34, Staxrip 1.7 NVenc 4.22

Thanks.

rigaya commented 6 years ago

I've uploaded the file at the link below.

sample_movie_1080p.mpg (224MB) https://1drv.ms/v/s!AsYziax1Q91rtHqO-kqqfCGOuU6x

I'm using NVEncC 4.23 x64 directly from command line, not via StaxRip.

Environment Info of RTX2070

Y:\Test>x64\NVEncC64.exe --check-environment
NVEncC (x64) 4.23 (r939) by rigaya, Nov  8 2018 19:53:29 (VC 1900/Win/avx2)
  [NVENC API v8.1, CUDA 8.0]
 reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4]
Environment Info
OS : Windows 10 x64 (17134)
CPU: Intel Core i9-7980XE @ 2.60GHz [TB: 4.11GHz] (18C/36T)
RAM: Used 7359 MB, Total 16077 MB
GPU: #0: GeForce RTX 2070 (4608 cores, 1710 MHz)[PCIe3x16][416.34]

sample log (VBRQ HEVC B frames = 262.22fps)

Y:\Test>x64\NVEncC64.exe -i sample_movie_1080p.mpg -o out.mp4 -c hevc -b 3 --lookahead 32 --aq --vbrhq 0 --vbr-quality 26 --log log.txt
Input #0, mpeg, from '.\sample_movie_1080p.mpg':
  Duration: 00:02:53.61, start: 0.268422, bitrate: 10814 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
NVEncC (x64) 4.23 (r939) by rigaya, Nov  8 2018 19:53:29 (VC 1900/Win/avx2)
OS Version     Windows 10 x64 (17134)
CPU            Intel Core i9-7980XE @ 2.60GHz [TB: 4.11GHz] (18C/36T)
GPU            #0: GeForce RTX 2070 (4608 cores, 1710 MHz)[PCIe3x16][416.34]
NVENC / CUDA   NVENC API 8.1, CUDA 10.0, schedule mode: auto
Input Buffers  CUDA, 44 frames
Input Info     avcuvid: MPEG2, 1920x1080, 30000/1001 fps
Vpp Filters    copyDtoD
Output Info    H.265/HEVC main @ Level auto
               1920x1080p 1:1 29.970fps (30000/1001fps)
               avwriter: hevc => mp4
Encoder Preset default
Rate Control   VBRHQ
Bitrate        0 kbps (Max: 11520 kbps)
Target Quality 26.00
Initial QP     I:20  P:23  B:25
VBV buf size   auto
Lookahead      on, 32 frames, Adaptive I, B Insert
GOP length     300 frames
B frames       3 frames
Ref frames     3 frames, LTR: off
AQ             on
CU max / min   auto / auto
Others         mv:auto 
Output #0, mp4, to 'out.mp4':
  Metadata:
    encoding_tool   : NVEncC (x64) 4.23
    encoder         : Lavf58.15.100
    Stream #0:0: Video: hevc (Main) (hev1 / 0x31766568), nv12(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 120k tbn

[mp4 @ 000001e215e58580] Starting second pass: moving the moov atom to the beginning of the file

encoded 5204 frames, 262.22 fps, 8489.34 kbps, 175.73 MB
encode time 0:00:19, CPU: 2.7%, GPU: 4.8%, VE: 88.1%, GPUClock: 1410MHz, VEClock: 1663MHz
frame type IDR   19
frame type I     19,  avgQP  24.32,  total size   3.52 MB
frame type P   1450,  avgQP  25.13,  total size  93.37 MB
frame type B   3735,  avgQP  26.69,  total size  78.84 MB
parhamsan commented 6 years ago

Thanks for the video: Below are my results:

D:\Test>NVEncC64.exe --check-environment NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) [NVENC API v8.1, CUDA 8.0] reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4] Environment Info OS : Windows 10 x64 (17134) CPU: Intel Core i7-7700K @ 4.20GHz [TB: 4.50GHz] (4C/8T) RAM: Used 4562 MB, Total 16315 MB GPU: #0: GeForce RTX 2080 Ti (8704 cores, 1665 MHz)[PCIe3x16][416.81]

D:\Test>NVEncC64.exe -i sample_movie_1080p.mpg -o out.mp4 -c hevc -b 3 --lookahead 32 --aq --vbrhq 0 --vbr-quality 26 --log log.txt

out.mp4

NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.40GHz] (4C/8T) GPU #0: GeForce RTX 2080 Ti (8704 cores, 1665 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 44 frames Input Info avcuvid: MPEG2, 1920x1080, 30000/1001 fps Vpp Filters copyDtoD Output Info H.265/HEVC main @ Level auto 1920x1080p 1:1 29.970fps (30000/1001fps) avwriter: hevc => mp4 Encoder Preset default Rate Control VBRHQ Bitrate 0 kbps (Max: 11520 kbps) Target Quality 26.00 Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead on, 32 frames, Adaptive I, B Insert GOP length 300 frames B frames 3 frames Ref frames 3 frames, LTR: off AQ on CU max / min auto / auto Others mv:auto

encoded 5204 frames, 284.11 fps, 8489.34 kbps, 175.73 MB encode time 0:00:18, CPU: 11.8%, GPU: 3.3%, VE: 90.7%, GPUClock: 1994MHz, VEClock: 1846MHz frame type IDR 19 frame type I 19, avgQP 24.32, total size 3.52 MB frame type P 1450, avgQP 25.13, total size 93.37 MB frame type B 3735, avgQP 26.69, total size 78.84 MB

This seems consistent with your result and FPS

Now here are the settings I use to get 180FPS (it actually went lower 130FPS). What am i doing wrong for CBR to get such low FPS 170-180?

I prefer the CBR:

--cbr 2200 --codec h265 --preset quality --output-depth 10


out.mkv

Input #0, matroska,webm, from 'Outlander.S04E01.America.The.Beautiful.1080p.NF.WEB.DD5.1.x264-NTb.mkv': Metadata: encoder : libebml v1.3.5 + libmatroska v1.4.8 creation_time : 2018-11-05T20:36:07.000000Z Duration: 01:02:27.87, start: 0.000000, bitrate: 10672 kb/s Stream #0:0(eng): Video: h264 (High), yuv420p(progressive), 1920x1080, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Metadata: BPS : 10285966 BPS-eng : 10285966 DURATION : 01:02:27.828000000 DURATION-eng : 01:02:27.828000000 NUMBER_OF_FRAMES: 89858 NUMBER_OF_FRAMES-eng: 89858 NUMBER_OF_BYTES : 4818754023 NUMBER_OF_BYTES-eng: 4818754023 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:1(eng): Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s (default) Metadata: title : English BPS : 384000 BPS-eng : 384000 DURATION : 01:02:27.872000000 DURATION-eng : 01:02:27.872000000 NUMBER_OF_FRAMES: 117121 NUMBER_OF_FRAMES-eng: 117121 NUMBER_OF_BYTES : 179897856 NUMBER_OF_BYTES-eng: 179897856 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:2(eng): Subtitle: subrip Metadata: title : SDH BPS : 65 BPS-eng : 65 DURATION : 01:00:46.768000000 DURATION-eng : 01:00:46.768000000 NUMBER_OF_FRAMES: 929 NUMBER_OF_FRAMES-eng: 929 NUMBER_OF_BYTES : 29900 NUMBER_OF_BYTES-eng: 29900 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:3(ara): Subtitle: subrip Metadata: BPS : 92 BPS-eng : 92 DURATION : 00:58:10.862000000 DURATION-eng : 00:58:10.862000000 NUMBER_OF_FRAMES: 714 NUMBER_OF_FRAMES-eng: 714 NUMBER_OF_BYTES : 40367 NUMBER_OF_BYTES-eng: 40367 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:4(ger): Subtitle: subrip Metadata: BPS : 59 BPS-eng : 59 DURATION : 00:58:11.488000000 DURATION-eng : 00:58:11.488000000 NUMBER_OF_FRAMES: 674 NUMBER_OF_FRAMES-eng: 674 NUMBER_OF_BYTES : 25992 NUMBER_OF_BYTES-eng: 25992 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:5(spa): Subtitle: subrip Metadata: title : European BPS : 54 BPS-eng : 54 DURATION : 00:58:10.862000000 DURATION-eng : 00:58:10.862000000 NUMBER_OF_FRAMES: 687 NUMBER_OF_FRAMES-eng: 687 NUMBER_OF_BYTES : 23936 NUMBER_OF_BYTES-eng: 23936 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES Stream #0:6(fre): Subtitle: subrip Metadata: BPS : 60 BPS-eng : 60 DURATION : 00:58:10.862000000 DURATION-eng : 00:58:10.862000000 NUMBER_OF_FRAMES: 711 NUMBER_OF_FRAMES-eng: 711 NUMBER_OF_BYTES : 26274 NUMBER_OF_BYTES-eng: 26274 _STATISTICS_WRITING_APP: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_APP-eng: mkvmerge v19.0.0 ('Brave Captain') 64-bit _STATISTICS_WRITING_DATE_UTC: 2018-11-05 20:36:07 _STATISTICS_WRITING_DATE_UTC-eng: 2018-11-05 20:36:07 _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.40GHz] (4C/8T) GPU #0: GeForce RTX 2080 Ti (8704 cores, 1665 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 36 frames Input Info avcuvid: H.264/AVC, 1920x1080, 24000/1001 fps Vpp Filters cspconv(nv12 -> p010) Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) avwriter: hevc => matroska Encoder Preset quality Rate Control CBR Bitrate 2200 kbps (Max: 2200 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto Output #0, matroska, to 'out.mkv': Metadata: encoding_tool : NVEncC (x64) 4.23 encoder : Lavf58.15.100 Stream #0:0(eng): Video: hevc (Main 10), yuv420p10le(progressive), 1920x1080, q=2-31, 1k tbn (default)

encoded 89858 frames, 137.31 fps, 1868.78 kbps, 834.93 MB encode time 0:10:54, CPU: 9.7%, GPU: 2.2%, VE: 97.7%, GPUClock: 2008MHz, VEClock: 1860MHz frame type IDR 375 frame type I 375, avgQP 20.77, total size 18.49 MB frame type P 89483, avgQP 22.55, total size 816.44 MB

rigaya commented 6 years ago

I've tested with your commandline, I had around 160fps. By removing "--preset quality", I think you'll get over 300fps.

It seems like the driver is using more "heavy" setting on Turing (RTX20) when "--preset quality" is used. To have comparable preformance with Pascal (GTX10), please consider removing "--preset quality".

parhamsan commented 6 years ago

Now here is an interesting experiment:

I installed both my GPUs in my machine:

1-RTX 2080Ti (main) 2-GTX 1050Ti

And I encoded the same file with the same settings on both cards using staxrip. I disabled the RTX2080TI and encoded with the 1050Ti. Then I disabled the 1050Ti and encoded with the 2080TI. Below are the surprising results:

Using 1050Ti: C:\Users\PARHAM\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2475 --codec h265 --preset quality --output-depth 10 -i D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new.avs -o D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265

NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.40GHz] (4C/8T) GPU #0: GeForce GTX 1050 Ti (768 cores, 1392 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->p010 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset quality Rate Control CBR Bitrate 2475 kbps (Max: 2475 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 55035 frames, 293.58 fps, 2106.71 kbps, 576.47 MB encode time 0:03:07, CPU: 52.3, GPU: 34.4, VE: 95.4, GPUClock: 1737MHz, VEClock: 1486MHz frame type IDR 230 frame type I 230, total size 11.07 MB frame type P 54805, total size 565.40 MB

Start: 8:28:29 AM End: 8:31:39 AM Duration: 00:03:10

General Complete name : D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265 Format : HEVC Format/Info : High Efficiency Video Coding File size : 576 MiB

Video Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main 10@L4@Main Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 10 bits

Using RTX 2080Ti: C:\Users\PARHAM\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2475 --codec h265 --preset quality --output-depth 10 -i D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new.avs -o D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265

NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.40GHz] (4C/8T) GPU #0: GeForce RTX 2080 Ti (8704 cores, 1665 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->p010 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset quality Rate Control CBR Bitrate 2475 kbps (Max: 2475 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 55035 frames, 177.42 fps, 2106.05 kbps, 576.29 MB encode time 0:05:10, CPU: 32.3, GPU: 19.1, VE: 97.8, GPUClock: 2011MHz, VEClock: 1862MHz frame type IDR 230 frame type I 230, total size 11.36 MB frame type P 54805, total size 564.93 MB

Start: 7:27:31 AM End: 7:32:44 AM Duration: 00:05:13

General Complete name : D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265 Format : HEVC Format/Info : High Efficiency Video Coding File size : 576 MiB

Video Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main 10@L4@Main Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 10 bits

As it can be seen the 2050Ti is encoding the same file with Average FPS of: encoded 55035 frames, 293.58 fps, 2106.71 kbps, 576.47 MB

Where the 2080Ti is encoding with Average FPS of: encoded 55035 frames, 177.42 fps, 2106.05 kbps, 576.29 MB

Do you know what might be the results of 2080Ti having lower conversion FPS?

I do tons of file encodings everyday and i am getting to a point to return the RTX2080Ti and go back to 1080Ti where i was getting 450-460 FPS for converting x264 to x265 10bit.

Please please help!!!!

rigaya commented 6 years ago

Our comments might have crossed, I'll write again.

I've tested with your commandline, I had around 160fps. By removing "--preset quality", I think you'll get over 300fps.

It seems like the driver is using more "heavy" setting on Turing (RTX20) when "--preset quality" is used. To have comparable preformance with Pascal (GTX10), please consider removing "--preset quality".

parhamsan commented 6 years ago

Thanks removing "--preset quality" helped a lot:

Here are the results:

Using 1050Ti:

C:\Users\PARHAM\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2475 --codec h265 --output-depth 10 -i D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new.avs -o D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265

NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.50GHz] (4C/8T) GPU #0: GeForce GTX 1050 Ti (768 cores, 1392 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->p010 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset default Rate Control CBR Bitrate 2475 kbps (Max: 2475 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 55035 frames, 323.28 fps, 2104.85 kbps, 575.96 MB encode time 0:02:50, CPU: 57.9, GPU: 38.4, VE: 92.0, GPUClock: 1738MHz, VEClock: 1488MHz frame type IDR 230 frame type I 230, total size 11.39 MB frame type P 54805, total size 564.57 MB

Start: 8:56:10 AM End: 8:59:03 AM Duration: 00:02:52

General Complete name : D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265 Format : HEVC Format/Info : High Efficiency Video Coding File size : 576 MiB

Video Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main 10@L4@Main Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 10 bits

Using RTX 2080Ti:

C:\Users\PARHAM\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2475 --codec h265 --output-depth 10 -i D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new.avs -o D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265

NVEncC (x64) 4.23 (r939) by rigaya, Nov 8 2018 19:53:29 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17134) CPU Intel Core i7-7700K @ 4.20GHz [TB: 4.40GHz] (4C/8T) GPU #0: GeForce RTX 2080 Ti (8704 cores, 1665 MHz)[PCIe3x16][416.81] NVENC / CUDA NVENC API 8.1, CUDA 10.0, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->p010 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset default Rate Control CBR Bitrate 2475 kbps (Max: 2475 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 55035 frames, 345.54 fps, 2106.63 kbps, 576.45 MB encode time 0:02:39, CPU: 62.2, GPU: 40.6, VE: 90.9, GPUClock: 2010MHz, VEClock: 1785MHz frame type IDR 230 frame type I 230, total size 11.81 MB frame type P 54805, total size 564.64 MB

Start: 8:52:40 AM End: 8:55:22 AM Duration: 00:02:42

General Complete name : D:\Downloads\American.Horror..._temp\American.Horror.Story.S08E09.Fire.and.Reign.1080p.AMZN.WEBRip.DD5.1.x264-NTb_new_out.h265 Format : HEVC Format/Info : High Efficiency Video Coding File size : 576 MiB

Video Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main 10@L4@Main Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 10 bits

I really appreciate your help!!!

It is still surprising that the results from 1050Ti and 2080Ti are very close isn't it?

rigaya commented 6 years ago

It might be that the Video Engine which is used in encoding is mostly the same, regardless of the spec of the GPU.

But yeah, we expect more performance on RTX2080Ti...(8704 cores!)

tabnk commented 6 years ago

Update from NVIDIA.

https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

Look like NVENC encoding speed is TWICE as fast as PASCAL but NEW SDK is required.

1* The video encoder in Turing GPUs has substantially improved quality and performance compared with Pascal. The overall encoding capacity of one NVENC in Turing is comparable to two NVENC’s in Pascal.

** The Video Codec SDK, which exposes new encoder improvements and features of Turing will be released soon. Until then, users can continue to use Video SDK 8.2 on all GPUs.

parhamsan commented 6 years ago

Hey,

So i guess for now we just have to wait and wait for an update and a new release on the Video Codec SDK from NVIDIA.

Appreciate your update.

tabnk commented 5 years ago

Nvidia Video Codec SDK 9.0 coming soon https://developer.nvidia.com/nvidia-video-codec-sdk

parhamsan commented 5 years ago

Thanks for the update!!!

deinlandel commented 5 years ago

I basically have the same situation as topic starter, with video encoding performance much worse on RTX 2080 ti than on my old gtx 1080. I tried your latest release which claims to Support NVENC SDK, with same results =.

rigaya commented 5 years ago

Have you tried changing (or removing) the "--preset" option? On Pascals "--preset" options had little effect on performance, but on Turing has a major influence on performance. Turing GPUs have better quality compared to Pascal, but the performance might turn out to be lower. You need to re-adujust the options for the balance you want between speed and quality.

parhamsan commented 5 years ago

Hey, so I actually did some more tests with and without the "--preset quality" option and it makes a big difference as you also mentioned in the previous posts. I did the test on 2 PCs (one with 2080TI and the other one with 1050Ti GPU). See results below:

1- PC with RTX 2080Ti, i9-9900K

Test 1 (--preset quality removed):

C:\Users\USER\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2250 --codec h265 --output-depth 10 -i D:\Downloads\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new.avs -o D:\Downloads\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new_out.h265

NVEncC (x64) 4.34 (r1038) by rigaya, Mar 21 2019 00:01:48 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17763) CPU Intel Core i9-9900K @ 3.60GHz [TB: 5.00GHz] (8C/16T) GPU #0: GeForce RTX 2080 Ti (4352 cores, 1665 MHz)[PCIe3x16][419.67] NVENC / CUDA NVENC API 9.0, CUDA 10.1, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->nv12 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD cspconv(nv12 -> p010) Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset default Rate Control CBR Bitrate 2250 kbps (Max: 2250 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 3 frames [ref mode: disabled] Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 42418 frames, 330.82 fps, 2135.40 kbps, 450.36 MB encode time 0:02:08, CPU: 25.9, GPU: 14.6, VE: 96.6, GPUClock: 2006MHz, VEClock: 1856MHz frame type IDR 177 frame type I 177, total size 18.08 MB frame type P 10605, total size 357.17 MB frame type B 31636, total size 75.11 MB

Start: 8:16:20 AM End: 8:18:30 AM Duration: 00:02:09

Test 2 (with --preset quality):

C:\Users\USER\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2250 --codec h265 --preset quality --output-depth 10 -i D:\Downloads\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new.avs -o D:\Downloads\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new_out.h265

NVEncC (x64) 4.34 (r1038) by rigaya, Mar 21 2019 00:01:48 (VC 1900/Win/avx2) OS Version Windows 10 x64 (17763) CPU Intel Core i9-9900K @ 3.60GHz [TB: 5.00GHz] (8C/16T) GPU #0: GeForce RTX 2080 Ti (4352 cores, 1665 MHz)[PCIe3x16][419.67] NVENC / CUDA NVENC API 9.0, CUDA 10.1, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->nv12 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD cspconv(nv12 -> p010) Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset quality Rate Control CBR Bitrate 2250 kbps (Max: 2250 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 3 frames [ref mode: disabled] Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 42418 frames, 183.32 fps, 2133.75 kbps, 450.02 MB encode time 0:03:51, CPU: 16.7, GPU: 10.4, VE: 98.3, GPUClock: 2013MHz, VEClock: 1864MHz frame type IDR 177 frame type I 177, total size 18.08 MB frame type P 10605, total size 355.89 MB frame type B 31636, total size 76.04 MB

Start: 8:11:49 AM End: 8:15:43 AM Duration: 00:03:53

2- PC with GTX 1050 Ti, i5-4690K

Test 1 (--preset quality removed):

C:\Users\USER\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2250 --codec h265 --output-depth 10 -i C:\Users\USER\Desktop\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new.avs -o C:\Users\Parham\Desktop\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new_out.h265

Max B frames are 0 frames. NVEncC (x64) 4.34 (r1038) by rigaya, Mar 21 2019 00:01:48 (VC 1900/Win/avx2) OS Version Windows Server 2012 R2 x64 (9600) CPU Intel Core i5-4690K @ 3.50GHz (4C/4T) GPU #0: GeForce GTX 1050 Ti (768 cores, 1392 MHz)[PCIe3x16][419.67] NVENC / CUDA NVENC API 9.0, CUDA 10.1, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->nv12 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD cspconv(nv12 -> p010) Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset default Rate Control CBR Bitrate 2250 kbps (Max: 2250 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames [ref mode: disabled] Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 42418 frames, 237.83 fps, 2181.49 kbps, 460.08 MB encode time 0:02:58, CPU: 92.4, GPU: 21.8, VE: 67.9, GPUClock: 1741MHz, VEClock: 1564MHz frame type IDR 177 frame type I 177, total size 17.21 MB frame type P 42241, total size 442.87 MB

Start: 8:18:05 AM End: 8:21:09 AM Duration: 00:03:04

Test 2 (with --preset quality):

C:\Users\USER\Desktop\StaxRip-x64-1.7\Apps\NVEnc\NVEncC64.exe --cbr 2250 --codec h265 --preset quality --output-depth 10 -i C:\Users\USER\Desktop\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new.avs -o C:\Users\Parham\Desktop\Black.Monday.S0..._temp\Black.Monday.S01E09.2.1080p.AMZN.WEBRip.DD5.1.x264-monkee_new_out.h265

Max B frames are 0 frames. NVEncC (x64) 4.34 (r1038) by rigaya, Mar 21 2019 00:01:48 (VC 1900/Win/avx2) OS Version Windows Server 2012 R2 x64 (9600) CPU Intel Core i5-4690K @ 3.50GHz (4C/4T) GPU #0: GeForce GTX 1050 Ti (768 cores, 1392 MHz)[PCIe3x16][419.67] NVENC / CUDA NVENC API 9.0, CUDA 10.1, schedule mode: auto Input Buffers CUDA, 36 frames Input Info Avisynth+ 2.60(yv12)->nv12 [AVX2], 1920x1080, 24000/1001 fps Vpp Filters copyHtoD cspconv(nv12 -> p010) Output Info H.265/HEVC main10 @ Level auto 1920x1080p 1:1 23.976fps (24000/1001fps) Encoder Preset quality Rate Control CBR Bitrate 2250 kbps (Max: 2250 kbps) Target Quality auto Initial QP I:20 P:23 B:25 VBV buf size auto Lookahead off GOP length 240 frames B frames 0 frames [ref mode: disabled] Ref frames 3 frames, LTR: off AQ off CU max / min auto / auto Others mv:auto encoded 42418 frames, 226.36 fps, 2181.30 kbps, 460.04 MB encode time 0:03:07, CPU: 92.0, GPU: 22.6, VE: 76.6, GPUClock: 1746MHz, VEClock: 1569MHz frame type IDR 177 frame type I 177, total size 17.09 MB frame type P 42241, total size 442.95 MB

Start: 8:13:29 AM End: 8:16:42 AM Duration: 00:03:13

My Findings:

On Pascal based GPUs (in my case GTX 1050Ti) having the "--preset quality" option does not have a big impact on the performance and Frame rate of conversion where on Turing based GPUs (in my case RTX 2080Ti) by removing the "--preset quality" option i gained almost 2 times the fps when converting a x264 to x265 10bit file.

Summary based on GPU and fps:

image

I have also realized that on Turing GPU the VE (Video Engine) usage is very high, close to 99% all the time, when the "--preset quality" option is enabled.

When the "--preset quality" is removed seems like CPU usage goes higher but VE usage is still above 90%.

On the Pascal GPU the VE usage is definitely not maxing out (less than 80%).

I Still believe that the Turing GPU should perform better than the Pascal based GPUs which is the case.

I will do more tests and try to use the 1050Ti on the same PC with 9900k to make the test more relevent. I will also do some image comparison between the converted video files with "--preset quality" enabled and the one without the "--preset quality" option to see if it really matters if I use the option.

Thanks again for the follow up.

bigdwg71 commented 5 years ago

@rigaya Just got myself a Turing GPU so I am testing some of these out on my 4k rips. Are you able to tell me what the "--preset quality" does? Is it comparable to slow or medium as defined in the NVENC SDK? I couldn't find mention of quality and performance in the SDK docs so I don't understand what it is actually doing. Also, are there other, undocumented, presets? Is slow or medium an option?

My testing shows similar results as to what @parhamsan shows above. the quality preset roughly drops fps by 50%. I can't tell if the quality is actually affected yet, I might need to move to 1080p rips to check that. But if you can just tell us what is actually happening in the background, that might make it easier for me to decide if I should use it or not. Thanks in advance!

rigaya commented 5 years ago

"--preset" options is mapped to the presetGUID which is documented on "3.3 ENCODER PRESET CONFIGURATIONS" in NVENC SDK docs. I assume that some of the options are being overridden internally by setting this preset.

When lossless encoding is off,

You might need to check for ssim if you actually want to check the difference in quality.

bigdwg71 commented 5 years ago

Thanks, @rigaya! This is great! The SDK doc seems to imply that the actual preset options are define per card/chipset and doesn't document all the preset options:

"3.3.2 Selecting encoder preset configuration" "1. Enumerate the supported presets as described above, in Section 3.3.1."

Is there an nvencc command to show the supported presets? Just looking at the links you sent, I see mention of a BD preset? And from doing some reading I see that there is a Single Pass and Two-Pass mode offered for the HQ preset by the NVENC API. Do you know which the quality preset is using? I don't see another flag to define single or dual. Is that defined by the encoding mode? vbr, vbrhq, etc?

Sorry for the all the questions. But really do appreciate your time!

rigaya commented 5 years ago

Is there an nvencc command to show the supported presets? Unfortunately not. Instead of listing the presets, some presets maps to the command line options.

Presets exposed to command lines in NVEncC are on the table below. The "--bluray" option in NVEncC configures each options such as slices, refs, gop, etc... rather than using the BD preset.

command line GUID
--preset performance NV_ENC_PRESET_HP_GUID
--preset default NV_ENC_PRESET_DEFAULT_GUID
--preset quality NV_ENC_PRESET_HQ_GUID
--preset preformance --lossless
--preset default --lossless
NV_ENC_PRESET_LOSSLESS_DEFAULT_GUID
--preset quality --lossless NV_ENC_PRESET_LOSSLESS_HP_GUID

Single pass / 2 pass mode depends on rate control settings as you have mentioned. 2pass will be activated by using "--cbrhq/--vbrhq", which was formerly called 2pass in the NVENC API.

bigdwg71 commented 5 years ago

I wanted to close the loop on this...

My attempts to find ssim tool led me to the VMAF project run by Netflix (makes sense): https://github.com/Netflix/vmaf

I ran two encodes of a few different files one with quality preset and one with out, and while the speeds were drastically different in encoding, the VMAF score that the files received were within 3-4 decimal places of each other (statistically the same). I am proceeding without the quality preset for now until I get some evidence or data that pushes me the other way.

If someone has other information or a different result, I would love to know. I am using VBRHQ as my mode, and didn't test any other, so that could play a role in the result, but for me, I am good.

devil40xxx commented 5 years ago

it seems that if use "--cu-max 32 --cu-min 32 " the speed will boost even with the quality preset.

Will it really make impact to the image quility,will someone want to do some test?

pwacooijmans commented 5 years ago

https://github.com/staxrip/staxrip/releases try a newer version from staxrip with the new ffms2 that will increase encoding speed by a lot however you will have a slight decrease in quality...only slight though. https://www.dropbox.com/sh/4ctl2y928xkak4f/AAADEZj_hFpGQaNOdd3yqcAHa?dl=0 or try a beta build....2.0.4.10 is pretty damn stable. https://onedrive.live.com/?authkey=%21AAgTVAIvaL0X%2D5E&id=604D4754F64B0ABC%219380&cid=604D4754F64B0ABC or download from here. https://ci.appveyor.com/project/Revan654/staxrip/build/artifacts and download the latest build here.

pwacooijmans commented 5 years ago

ffmsindex.zip this is the old version 2.23 if you are looking for max quality like me.

rigaya commented 3 months ago

Closed as the conversation has been finished.