nicknsy / jellyscrub

Smooth mouse-over video scrubbing previews for Jellyfin.
MIT License
670 stars 27 forks source link

Hardware acceleration during ffmpeg usage? #12

Closed satmandu closed 1 year ago

satmandu commented 2 years ago

Jellyfin is really good about using hardware acceleration for FFmpeg when selected in options.

Jellyscrub doesn't appear to be using those hardware acceleration options. Is there a way to enable that?

nicknsy commented 2 years ago

Will look into this. Hopefully it would make generation faster

SandyRodgers-2017 commented 2 years ago

This command from ffmpeg could help with hardware acceleration ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -r 1/10 -c:v mjpeg_vaapi -global_quality 90 -f image2 output%03d.jpeg It seems that vaapi can make jpg and from that bifs can be made. I got it from this link: https://trac.ffmpeg.org/wiki/Hardware/VAAPI. I will have to do some testing myself, but I hope this helps.

SandyRodgers-2017 commented 2 years ago

The above command worked extremely well . It chewed through an entire 3 hour 16 minute movie in less than 10 minutes at full resolution, 1920 x 1080. With less than 20 percent cpu usage on my i7 7700 compare this to the 80 % for just 240p. My gpu an RX 480 however doesn't support mjpeg encoding using vaapi so I can't use that. There may be some other form of hardware accelerated jpg creation but I don't know about it. My finalized command script looks like this:

/usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=240:h=135' -c:v mjpegvaapi img%08d.jpg

nicknsy commented 2 years ago

The problem is I can't assume everyone has VAAPI supported device; for instance, I personally have an AMD cpu. However, I did try to get this working previously, but with nvidia cuda hardware decoding it took almost 80% longer to generate over software encoding. No idea why this happens but maybe I can just use vaapi for devices that support it.

SandyRodgers-2017 commented 2 years ago

The problem is I can't assume everyone has VAAPI supported device; for instance, I personally have an AMD cpu. However, I did try to get this working previously, but with nvidia cuda hardware decoding it took almost 80% longer to generate over software encoding. No idea why this happens but maybe I can just use vaapi for devices that support it.

Yes the sad thing is only intel supports vaapi mjpeg creation, AMD doesn't seem to have it at all and Nvidia uses cuda. I've been looking for gpu based examples of jpeg creation, but all of the things I've been finding is just people asking AMD to implement it. There is some talk about opencl/vulkan being used to encode jpgs or for scaling, which I think would make amd gpu better at jpg creation and video processing. Here are some of the relevant links: https://stackoverflow.com/questions/20953729/opencl-video-processing#20954813 https://github.com/anknown/opencl-jpeg https://github.com/roehrdor/opencl-jpeg-encoder https://ffmpeg.org/ffmpeg-all.html#OpenCL-Video-Filters https://stackoverflow.com/questions/55687189/how-to-use-gpu-to-accelerate-the-processing-speed-of-ffmpeg-filter/55747785 However, I think the above would take extensive rewrites

satmandu commented 2 years ago

Is it possible to use/detect the ffmpeg options set inside jellyfin?

e.g.

image

and also...

image

nyanmisaka commented 2 years ago

Only Intel graphics can encode JPEG image with its fix-function hardware via VAAPI or QSV. Both AMD and Nvidia cannot do that, but hardware decoding and scaling can still be done with GPU to help offloading the CPU.

Intel: hw dec -> hw scale -> hw enc

AMD&Nvidia: hw dec -> hw scale -> hw download -> sw enc

satmandu commented 2 years ago

Hardware decoding/scaling alone would be a huge help for this plugin.

SandyRodgers-2017 commented 2 years ago

I just made a command one liner for hardware decoding and scaling and it really does make a difference. It uses about the same amount of cpu power as just using intels hardware decoding alone, even though AMD does seem to be a little slower, about 20 percent or less on an i7 77000. Here it is for AMD:

/usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/"renderdevice" -hwaccel_output_format vaapi -i "inputfile" -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=240:h=135,hwdownload,format=nv12' -f image2 "outputdirectory/"%08d.jpg"

i am getting this warning though, even though the internet says it doesn't matter: [swscaler @ 0x562b3d85dd80] [swscaler @ 0x562b3d88f040] deprecated pixel format used, make sure you did set range correctly

satmandu commented 2 years ago

@SandyRodgers-2017 How do we get jellyscrub to use that command line automatically?

SandyRodgers-2017 commented 2 years ago

@SandyRodgers-2017 How do we get jellyscrub to use that command line automatically?

I'm not really much of a programmer, and I don't know git that well to make my own commits. I've just been using the command line to do the tests, and after I saw how well they worked I just created a bashscript. I actually have been using this bashscript that I made to create the bifs instead of jellyscrub for creation. However, the file that will need to be edited are around line 70, maybe a little before, in this file: https://github.com/nicknsy/jellyscrub/blob/main/Nick.Plugin.Jellyscrub/Drawing/OldMediaEncoder.cs

and you would need to add a button to this file
https://github.com/nicknsy/jellyscrub/blob/main/Nick.Plugin.Jellyscrub/Configuration/configPage.html and use this file to tell OldMediaEncoder.cs what your configuration is https://github.com/nicknsy/jellyscrub/blob/main/Nick.Plugin.Jellyscrub/Configuration/PluginConfiguration.cs Not including packaging the changes.

My bash script is here and I hope that helps. Change extension to .sh after downloaded. However it is still not a fully functional script you will have to open file, read it and put in the correct information.:

trickplay.txt

and download biftool from:

https://developer.roku.com/docs/developer-program/media-playback/trick-mode/bif-file-creation.md

or

https://github.com/rokudev/samples/tree/master/utilities/bif%20tool

JVT038 commented 2 years ago

Any update on this? Generating bif files currently takes basically all available CPU power, ranging from 30% to 70% sometimes...

nicknsy commented 2 years ago

I might go back to the hw accel I had already implemented. Its a bit slower on some devices but I guess it would relieve CPU usage. Dunno.

christovic commented 2 years ago

Just modified @SandyRodgers-2017 script, works really well. Getting through 1080p files in about 5 minutes with an interval of 5 seconds. It also allows me to exclude my 4K files which were requiring tone mapping to generate the bif files, so were taking about an hour each. I'm using Intel QuickSync on an i3-6100 in case anyone is interested. intel_gpu_top shows nearly 100% usage while CPU hovers at around 20%, although there is a fair amount of other stuff running on this system, including other ffmpeg proccesses for cameras.

hung0702 commented 2 years ago

/usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=240:h=135' -c:v mjpegvaapi img%08d.jpg

Could you help me adapt this for my setup? 320px, 10.8.4 on Windows. Metadata stored within folders, every movie a folder, series>seasons>eps.

Maybe something like

"C:\Users\Hung\Apps\Jellyfin\Server\ffmpeg.exe" -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=320:h=180' -c:v mjpegvaapi img%08d.jpg

However, I'm not familiar with how ffmpeg works. How does it initialize hwaccel_devices? ffmpeg.org was a bit slim but thankfully there was a wiki that suggests I don't have to change anything bolded? Also do you know if I could use qsv or does this process require vaapi? My system (8365u) supports transcoding HEVC 10-bit 5.1 and all of my content seems supported per QuickSync.

SandyRodgers-2017 commented 2 years ago

/usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=240:h=135' -c:v mjpegvaapi img%08d.jpg

Could you help me adapt this for my setup? 320px, 10.8.4 on Windows. Metadata stored within folders, every movie a folder, series>seasons>eps.

Maybe something like

"C:\Users\Hung\Apps\Jellyfin\Server\ffmpeg.exe" -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_vaapi=format=nv12:w=320:h=180' -c:v mjpegvaapi img%08d.jpg

However, I'm not familiar with how ffmpeg works. How does it initialize hwaccel_devices? ffmpeg.org was a bit slim but thankfully there was a wiki that suggests I don't have to change anything bolded? Also do you know if I could use qsv or does this process require vaapi? My system (8365u) supports transcoding HEVC 10-bit 5.1 and all of my content seems supported per QuickSync.

The above one liner is for linux and maybe docker and only functions to create images and not bifs. It was a way for me to test if my hardware acceleration would work and propose and ffmpeg command that could be used in jellyscrub itself. to explain it I will break it into parts

/usr/lib/jellyfin-ffmpeg/ffmpeg tells script what ffmmpeg to use

-hwaccel vaapi chooses the hardware acceleration type

-hwaccel_device /dev/dri/renderD128 tells script what hardware acceleration device to use on linux. In my case this is the path that points to my intel igpu

-hwaccel_output_format vaapi tells the device that the output that it makes will also be vaapi

-i input.mkv input file

-autoscale 0 scales video according to resolution of first frame 0 sets frame to start at 0

-r 1/10 stands for rate 1/10 which is equal to a thumb once every 10 seconds

-vf 'scale_vaapi=format=nv12:w=240:h=135' -vf video filter scale_vaapi= calls the vaapi scale function format=nv12 tells scale to use nv12 color space. this is the only one that intel supports on my cpu :w=240 means that width is 240 :h=135 means that height is 135 this is 16:9

-c:v mjpeg_vaapi -c:v stands for codec video mjpeg_vaapi tells intel igpu to use hardware accelerated jpeg creation.

img%08d.jpg is the output format it matches what is neccesary for biftool to work it stands for img[8 digit long number].jpg

for windows i think it will be something like this. first may sure quicksync is enabled https://www.intel.com/content/www/us/en/support/articles/000029338/graphics.html

"C:\Users\Hung\Apps\Jellyfin\Server\ffmpeg.exe" -hwaccel qsv -hwaccel_output_format qsv -i input.mkv -autoscale 0 -r 1/10 -vf 'scale_qsv=format=nv12:w=320:h=180:force_original_aspect_ratio=decrease' -c:v mjpegqsv "C:[DIRECTORY]\img%08d.jpg"

I don't have a windows pc so I can't really test this for windows but it should work. If it does then if you know powershell scripting then you could make a script that uses that and biftool to make bifs. However, you may be able to use windows subsystem for linux to use my above bash script or my newly updated one below. follow the same directions above in my previous comment trickplay.txt

SandyRodgers-2017 commented 2 years ago

Just modified @SandyRodgers-2017 script, works really well. Getting through 1080p files in about 5 minutes with an interval of 5 seconds. It also allows me to exclude my 4K files which were requiring tone mapping to generate the bif files, so were taking about an hour each. I'm using Intel QuickSync on an i3-6100 in case anyone is interested. intel_gpu_top shows nearly 100% usage while CPU hovers at around 20%, although there is a fair amount of other stuff running on this system, including other ffmpeg proccesses for cameras.

Hi christovic, The previous script was a little flawed due to me not understanding how much -t value mattered when it came to usage of biftool. If the value doesn't match the fps then the video previews will be slightly or wildly off. Something I didn't orginally notice because I kept the values at their defaults 1/10 and -t 10000, but if you change that value then the screens are wrong. Hopefully you noticed and fixed this issue. To fix this and get the -t value for biftool you take the denominator of your fps or -r and multiply it by 1000 this matches the -t value. If you didn't notice this, the above new trickplay.txt fixes that issue and adds a number of other arguments to help with usage. I also changed the -r to fps to better match what I expect to happen and increase predictability of thumbnail creation. My reasoning comes from what I read here. https://stackoverflow.com/questions/51143100/framerate-vs-r-vs-filter-fps -r changes the timing of the file which can lead to the thumbs being slightly off by 1 second to a few hundred milliseconds fps by contrast drops or duplicates frames to get the desired rate from my testing fps gave me more consistent, predictable, and correct thumbnails when scrubbing.

devenator commented 1 year ago

Only Intel graphics can encode JPEG image with its fix-function hardware via VAAPI or QSV. Both AMD and Nvidia cannot do that, but hardware decoding and scaling can still be done with GPU to help offloading the CPU.

Nvidia can do that via CUDA:

CUDA 10 comes with these other components:

nvJPEG – Hybrid (CPU and GPU) JPEG processing

CUDA 11.0-11.7 comes with these other components:

nvJPEG2000 – JPEG 2000 encoder and decoder

NVENC/NVDEC are independent cores from the CUDA-Cores, but the librarys&driver for the first require the later (ie for scaling), so everyone who has working NVENC/NVDEC already has the required packages installed...

But as a more general question, instead of re-inventing the wheel with ffmpeg-hardwareencoding, would it not be better to ask the jellyfin-developers if they would implement jpg encoding in the core? At least to my knowledge, currently, jellyfin only supports video-transcoding, if jpeg (or even jpeg2000) hardware-endcoding would be implemented there, everything, including plugins would benefit from it...

nyanmisaka commented 1 year ago

Well, GPGPU does accelerate JPEG encoding. We always welcome anyone implementing this new encoder in ffmpeg.

memehammad commented 1 year ago

Has anyone tried implementing CUDA into @SandyRodgers-2017 script? I've asked ChatGPT to do it but it doesn't work.

memehammad commented 1 year ago

Nvm, here it is: -hwaccel cuda -hwaccel_device 0 -hwaccel_output_format cuda -i "$currentdirectory/$file" \ -autoscale 0 -start_number 0 -vf "fps=1/$vfps,scale_cuda=format=nv12:w=240:h=135:force_original_aspect_ratio=decrease,hwdownload,format=nv12" \ -f image2 "$bifdir"/%08d.jpg Edit: this only seems to work for h264 and 265 files.

SandyRodgers-2017 commented 1 year ago

Nvm, here it is: -hwaccel cuda -hwaccel_device 0 -hwaccel_output_format cuda -i "$currentdirectory/$file" \ -autoscale 0 -start_number 0 -vf "fps=1/$vfps,scale_cuda=format=nv12:w=240:h=135:force_original_aspect_ratio=decrease,hwdownload,format=nv12" \ -f image2 "$bifdir"/%08d.jpg Edit: this only seems to work for h264 and 265 files.

It only works with x264 and x265 files because that is what your gpu supports when it comes to hardware encoding & decoding. It could also be an ffmpeg problem if you haven't updated. You will need to make a fallback for other codecs to software encoding & decoding. If you are on linux and use mesa and vaaapi then you can use this to see what hardware encoders your gpu supports vainfo --display drm --device /dev/dri/renderD128

Here is a script below that adds the fallback for AV1 you can use it as a template to add other fallback as necessary. I may change it so that the script can auto detect what codecs your gpu supports hardware encoding, but for now you just add them manually. Rudimentary Changelog: -Adds Software Fallback for Unsupported video codecs -Adds Vertical Padding for Content that is not 16:9.

trickplay.txt

devenator commented 1 year ago

Just one note, the scripts above for nvidia does hardware decoding and scaling, but the jpeg-encoding is still software. It's already much faster this way (about factor 8-10), so many thanks for this, but for a full hardware-pipeline we have to wait for the ffmpeg-devs to implement nvJPEG...

NeuroDawg commented 1 year ago

Any movement on seeing HW accelerated decoding on nvidia gpu brought to this plugin?

kub3let commented 1 year ago

@SandyRodgers-2017 thanks a ton for your script, without it my NAS would have choked at 80°C for days.

Hardware acceleration really needs to be integrated into jellyscrub.

Script setup notes for linuxserver.io jellyfin

docker exec -it jellyfin bash

apt update && apt install unzip sudo nano wget
wget https://raw.githubusercontent.com/rokudev/samples/master/utilities/bif%20tool/biftool_linux.zip
unzip biftool_linux.zip
chown abc:abc biftool

wget https://github.com/nicknsy/jellyscrub/files/9472017/trickplay.txt
chown abc:abc trickplay.sh

nano trickplay.sh
# adjust biftool path & trickplay path

sudo -u abc bash trickplay.sh -d

# in case you want to delete all trickplays for some reason
find . -type d -name "trickplay" -exec rm -rf "{}" \;
satmandu commented 1 year ago

It would be nice for the script use acceleration, and then fallback to no acceleration only if an appropriate GPU isn't available.

nicknsy commented 1 year ago

I've gone ahead and added hw acceleration and tonemapping in 1.0.0.8 which should be up in a bit. As such, please create new issues if you find any bugs. I was only able to test on nvenc and my amd cpu doesn't support mjpeg encode so who knows.

NeuroDawg commented 1 year ago

I just want to give a big thank you to all who made this possible. I have an Nvidia GPU and this change has resulted in a significant improvement.

With the old CPU only processing, my CPU would be at 80-95% utilization for just over two hours (on average) to create a bif file for my 4K movies. With HW acceleration this morning, it took an average of 28 minutes for three 4K movies, and my CPU utilization never climbed above 15%.

Thank You!

satmandu commented 1 year ago

Thanks all... Some day hopefully, AMD GPUs will have mjpeg_vaapi support...

nyanmisaka commented 1 year ago

The 320x180 mjpeg SW encoding isn't that heavy. 4k decoding, downscaling, pixel format conversion as well as tone-mapping are the most compute intensive tasks without GPU HWA.

satmandu commented 1 year ago

As an AMD GPU user, it would be nice to have the 4k decoding support using my AMD GPU then. :)