Open mertalev opened 11 months ago
Just add my two cents about tone-mapping using Intel Quick Sync.
Calling ffmpeg with -vf "vpp_qsv=tonemap=1"
enables hardware accelerated tone-mapping with QSV, but it's only available when oneVPL is enabled when compiling FFmpeg. (When run FFmpeg ./configure
, just replace --enable-libmfx
with --enable-libvpl
, Intel oneVPL library is needed of course.) According to oneVPL dispatching behavior, I'm not sure whether this would work with Intel processor before Tiger Lake.
The version of FFmpeg we use is built with oneVPL, so that shouldn't be an issue, but it does seem like this wouldn't work if it dispatches to Media SDK. Jellyfin docs mention that the main advantage of QSV's tonemapping is lower power consumption, but otherwise OpenCL has wider hardware compatibility and is more customizable. Maybe that's the direction to go in that case.
Does it will apply for generate the thumbnails it could be good for big library of photos ?
No, it wouldn't have an effect on images. But for live/motion photos, the video portion of these would benefit.
I was curious about why immich even with hardware transcoding enabled was basically maxing out my 16 cpu cores even with only doing 1 transcode job. It also only is using 15% of my GPU render capability.
I went ahead and played with some of the ffmpeg options. Most of this is known but just adding my findings here:
Here is a sample immich ffmpeg call when using Intel QSV
ffmpeg -init_hw_device qsv=hw -filter_hw_device hw -i upload/upload/4ef.../...c2f.MOV -y -c:v hevc_qsv -c:a aac -movflags faststart -fps_mode passthrough -map 0:0 -map 0:1 -bf 7 -refs 5 -g 256 -v verbose -vf zscale=t=linear:npl=100,tonemap=hable:desat=0,zscale=p=bt709:t=bt709:m=bt709:range=pc,format=nv12,hwupload=extra_hw_frames=64,scale_qsv=1080:-1 -preset 7 -global_quality 23 upload/encoded-video/4ef.../...c7d.mp4
As stated in original post, we aren't using hardware decoding, by enabling this I see about a 5% reduction in CPU load.
I get a 5% improvement also by setting the preset
to fast
.
I am not super familiar with ffmpeg but the remainder of extra cpu load is coming from the filters. Is there a reason we need to do tone-mapping and all the zscale options? If I trim it down to the following, I get about 75% reduction in CPU load.
/usr/lib/jellyfin-ffmpeg/ffmpeg -init_hw_device qsv=hw -filter_hw_device hw -c:v hevc_qsv -i /config/test.MOV -y -c:v hevc_qsv -c:a aac -movflags faststart -fps_mode passthrough -map 0:0 -map 0:1 -bf 7 -refs 5 -g 256 -v verbose -vf format=nv12,hwupload=extra_hw_frames=64,scale_qsv=1080:-1 -preset fast -global_quality 23 /config/test_OUT.mp4
@rishid Thumbnail generation is still using CPU. If you don't have machine learning setup to use GPU, it will also uses CPU
Sure understood but specifically the single parent ffmpeg process, which is doing the video transcoding for encoded-videos, is the showing cpu usage of ~800% on my machine.
Unsure, perhaps passing through configuration is not right?
I completely forgot there are a lot of config knobs for Video Transcoder settings available in Immich - I think all my observations can be controlled already.
For Quick Sync, I got VPP tone-mapping working, but OpenCL doesn't work (something about not being able to allocate memory to the OpenCL device) and Vulkan is almost thrice as slow because it doesn't support zero-copy like it does for CUDA. VPP doesn't have the tone-mapping settings we use for other backends, but it is also the fastest option and tailored specifically for Intel devices. I can use that for QSV and let VAAPI use OpenCL (once I figure out how to get it to work).
This is a tracking issue for adding hardware decoding and tone-mapping support for transcoding.
What
Hardware-accelerated decoding loads videos to an acceleration device to decode with the device's built-in support for certain codecs and formats. This differs from software decoding, where videos are instead loaded and decoded by a program.
Hardware-accelerated tone-mapping is similarly performed within the acceleration device, but takes place after decoding.
Why
Hardware decoding is good for a number of reasons.
Concerns
Tasks