mdhiggins / sickbeard_mp4_automator

Automatically convert video files to a standardized format with metadata tagging to create a beautiful and uniform media library
MIT License
1.53k stars 202 forks source link

How to increase GPU usage? #1527

Closed andrerxavier closed 2 years ago

andrerxavier commented 2 years ago

My script is configured and working perfectly with my GPU, but I see that it only uses 12% of the maximum capacity, I wanted to know if it is possible to increase the use of the GPU?

mdhiggins commented 2 years ago

This is more of an FFMPEG question about hardware acceleration and you included nothing about your configuration

Please include information about your hardware, current autoProcess settings, and a sample conversation log with debugging enabled if you would like help but it still may be more of an FFMPEG question

mdhiggins commented 2 years ago

Starting reading https://trac.ffmpeg.org/wiki/HWAccelIntro

How to enable debug logging https://github.com/mdhiggins/sickbeard_mp4_automator/wiki/Debug-Level-Logging

Please include a log from a single conversion job where you're seeing the issue from start to finish

andrerxavier commented 2 years ago

I forgot to put the log and the autoprocess. autoProcess.txt sma.txt

mdhiggins commented 2 years ago

Alright so to break down what you're doing here

/usr/bin/ffmpeg -fix_sub_duration -hwaccel cuda -i "/home/andre/Vídeos/Matrix Resurrections 2021 WEB-DL 1080p DUAL 5.1.mkv" -vcodec h264_nvenc -map 0:0 -field_order progressive -vb 2377k -metadata:s:v BPS=2377000 -metadata:s:v BPS-eng=2377000 -metadata:s:v title=FHD -metadata:s:v handler_name=FHD -c:a:0 aac -map 0:1 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -filter:a:0 pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE -metadata:s:a:0 title=Stereo -metadata:s:a:0 handler_name=Stereo -metadata:s:a:0 language=por -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 aac -map 0:1 -ac:a:1 6 -b:a:1 384k -metadata:s:a:1 BPS=384000 -metadata:s:a:1 BPS-eng=384000 -metadata:s:a:1 "title=5.1 Channel" -metadata:s:a:1 "handler_name=5.1 Channel" -metadata:s:a:1 language=por -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:2 aac -map 0:2 -ac:a:2 2 -b:a:2 256k -metadata:s:a:2 BPS=256000 -metadata:s:a:2 BPS-eng=256000 -filter:a:2 pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE -metadata:s:a:2 title=Stereo -metadata:s:a:2 handler_name=Stereo -metadata:s:a:2 language=eng -disposition:a:2 -default-dub+original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:3 copy -map 0:2 -metadata:s:a:3 "title=5.1 Channel" -metadata:s:a:3 "handler_name=5.1 Channel" -metadata:s:a:3 language=eng -disposition:a:3 -default-dub+original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -c:s:0 mov_text -map 0:3 -metadata:s:s:0 title=Forced -metadata:s:s:0 handler_name=Forced -metadata:s:s:0 language=por -disposition:s:0 +default-dub-original-comment-lyrics-karaoke+forced-hearing_impaired-visual_impaired-captions -c:s:1 mov_text -map 0:4 -metadata:s:s:1 title= -metadata:s:s:1 handler_name= -metadata:s:s:1 language=por -disposition:s:1 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 1 -metadata:g encoding_tool=SMA -y "/home/andre/Vídeos/Matrix Resurrections 2021 WEB-DL 1080p DUAL 5.1.mp4"

That's your actual FFMPEG command that gets generated, and it looks like you're getting hardware accelerated encoding but not decoding.

You can see in the debug log that the script finds a valid potential decoder

2022-02-02 19:37:45 - MANUAL - DEBUG - Decoder: h264_cuda.

but because that's not on your list of approved decoders so it doesn't get applied. To fix this please add that decoder to your hwaccel-decoders option, example below

hwaccel-decoders = h264_cuda, hevc_cuda, h264_cuvid, mjpeg_cuvid, mpeg1_cuvid, mpeg2_cuvid, mpeg4_cuvid, vc1_cuvid, hevc_qsv, h264_qsv, hevc_vaapi, h264_vaapi, h265_qsv, hevc_qsv

I don't believe h264_nvenc is a valid decoder which I suspect you added on top of the defaults, so I removed that in my sample, added h264_cuda, and also added hevc_cuda in case you encounter a hevc source file in the future (decoder is determined by the input file codec not the output codec)

There are also some additional options that should be set to ensure that the hardware decoder/encoder are working without a software intermediate layer by keeping everyone on the same device and in the cuda format (which needs to be preserved to pass directly from decoder to encoder on the GPU without an intermediate CPU step)

In order to get the script to include these elements you need to adjust your current options:

hwdevices = vaapi:/dev/dri/renderD128
hwaccel-output-format = vaapi:vaapi

Change hwdevices to include a device for cuda which is what you're using and explicitly include cuda format info

hwdevices = cuda:/dev/dri/yourGPU
hwaccel-output-format = cuda:cuda

That tells SMA that for the cuda hwaccel option to use the cuda format (this is not always the same, you could use a mix of cuda and cuvid if you're wondering why this needs to be explicitly stated). You also will need to modify that example to include the reference to your gpu

ffmpeg -f lavfi -i nullsrc -c:v h264_nvenc -gpu list -f null -

That command should print a list of valid GPUs to help get your started on what value needs to go there

Finally, after all is said and done and you have everything working, you still will likely not get full GPU usage. Encoders are limited in the number of cores they can use at once and you are often bottlenecked by other parts of the process (audio encoding is still done on the CPU, hard drive read/write speeds etc). It also looks like you're using a fairly low bitrate with minimal quality options so the GPU may just not need to flex that hard so you'll need to factor all that into utilization.

If you play around with your FFMPEG command and find a variation that does something in a more efficient fashion feel free to post it and I'll make some suggestions about how to achieve that in the script

andrerxavier commented 2 years ago

Thank you very much for the directions! I made the changes and this is the result of the command mentioned above

LUNAR:~# ffmpeg -f lavfi -i nullsrc -c:v h264_nvenc -gpu list -f null

ffmpeg version 4.4-6ubuntu5 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-7ubuntu1)
  configuration: --prefix=/usr --extra-version=6ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, lavfi, from 'nullsrc':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 320x240 [SAR 1:1 DAR 4:3], 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (nvenc))
Press [q] to stop, [?] for help
[nvenc @ 0x5589dc9ed080] This encoder is deprecated, use 'h264_nvenc' instead
[nvenc @ 0x5589dc9ed080] [ GPU #0 - < NVIDIA GeForce GTX 1050 > has Compute SM 6.1 ]
mdhiggins commented 2 years ago

GPU #0 So nvenc it looks like you just specify a GPU number, in this case 0

You can try

hwdevices = cuda:0

If that gives you problems, you can probably just leave hwdevices blank since it looks like you'll only have one valid device anyway

andrerxavier commented 2 years ago

Modifications made, but performance reduced after the changes. Follow log and config.

autoProcess.txt Captura de tela de 2022-02-04 11-03-41 sma.txt

mdhiggins commented 2 years ago

You appear to have completely changed encoders and primary codecs since the original post in this thread and are now converting from h264 to hevc which will produce very different results as opposed to h264 initially and may have different gpu utilization rates

Also, it looks like the cuda decoder still isn't getting applied here and I looked more closely at your logs and it looks like your FFMPEG build doesn't have a h264_cuda decoder (my guess is that a dedicated h264_cuda decoder doesn't exist and you need to use cuvid in this case)

{
   "h264": {
      "decoders": [
         "h264",
         "h264_v4l2m2m",
         "h264_qsv",
         "h264_cuvid"
      ],
      "encoders": [
         "libx264",
         "libx264rgb",
         "h264_nvenc",
         "h264_omx",
         "h264_qsv",
         "h264_v4l2m2m",
         "h264_vaapi",
         "nvenc",
         "nvenc_h264"
      ]
   }
}

You can try switching to cuvid instead which should still keep things on the GPU and will remain within your supported decoders

hwaccels = cuvid
hwaccel-decoders = h24_cuvid, hevc_cuvid, h264_cuda, hevc_cuda, h264_cuvid, mjpeg_cuvid, mpeg1_cuvid, mpeg2_cuvid, mpeg4_cuvid, vc1_cuvid, hevc_qsv, h264_qsv, hevc_vaapi, h264_vaapi, h265_qsv, hevc_qsv
hwdevices = cuvid:0, cuda:0
hwaccel-output-format = cuvid:cuvid, cuda:cuda

Also overall GPU core utilization may not be the best metric here as again, bottlenecking from other sources may be the issue and as you remove the software layer you may see more efficient utilization of the GPU with less conversion required / less work on the GPU

https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/

This guide is particularly helpful if you're new to nvidia transcoding and has some information towards the bottom about monitor performance bottlenecks

andrerxavier commented 2 years ago

Yes, It's my first experience with the script using Nvidia GPU, I'm carrying out several tests to define the most efficient and fastest way to transcode my media. But I will continue reading on the subject and reporting the changes here. Thanks a lot for the help

mdhiggins commented 2 years ago

I would at least try switching to cuvid as the hwaccel / decoder and see if that nets some performance gains

mdhiggins commented 2 years ago

image

Using a mix of CUVID/CUDA settings and nvenc going from a 4K hevc bluray rip to h264 I'm seeing near 100% utilization with around 20-25% CPU utilization still though I can see the video encoder and decoder look like they are both seeing decent utilization

All audio is just remuxing

This is on an RTX 3090 FE and a 7200 RPM HDD, CPU is a Ryzen 5900X

I used the following settings

[Converter]
hwaccels = cuda
hwaccel-decoders = hevc_cuvid, h264_cuvid
hwdevices = cuda:0
hwaccel-output-format = cuda:cuvid

...

[Video]
codec = h264_nvenc

I made a small update that allows the hwaccel-output-format to be used in decoder selection as opposed to just the hwaccel option which lets me do the cuda/cuvid crossover, so you'll need to update for these settings to work

6b179165f2b040bb0bbc7e751e6e2af4a2664643

mdhiggins commented 2 years ago

I mostly chose this path as opposed to using hwaccels = cuvid because I can't seem to find a precompiled binary of ffmpeg for windows with that available and I don't feel like compiling from source just for this test, you can check what your ffmpeg binary supports by running ffmpeg -hwaccels

Looking at your previous logs

2022-02-03 16:44:47 - MANUAL - DEBUG - FFMPEG hwaccels:
2022-02-03 16:44:47 - MANUAL - DEBUG - ['vdpau', 'cuda', 'vaapi', 'qsv', 'drm', 'opencl']

So you might need to do the same thing I did above since cuvid isn't on your list either

mdhiggins commented 2 years ago

Going to close this out for now since its more or less just a usage question but feel free to continue discussion and I will continue to reply

andrerxavier commented 2 years ago

Going to close this out for now since its more or less just a usage question but feel free to continue discussion and I will continue to reply

Thanks @mdhiggins, Really the consumption between GPU and CPU was half and half, I'll read a little more on the subject and perform new tests.