Open iamjen023 opened 4 years ago
For the file transcoding we use x264 with the faster preset and thats probably the only place it might be quicker with nvenc (not sure about quality) BUT file transcoding imho is not that needed anymore since we now have live stream transcoding for unsupported files.( IMHO transcodes in the generated content section should be left unticked in 99% of the cases )
For live transcoding we have vp9 webm files that are not supported through hardware encoding except through intel vaapi i think and not sure about the stability / quality of that also.
Finally for the generated previews and markers we use x264 with preset veryslow to get the highest quality , since they are only generated once but viewed many times. If you wanted to make the generate faster thats where maybe we could opt to change the veryslow preset to medium or even fast and still get better quality/performance than hardware encoders. Thats ofcourse only for anyone that is willing to compromise the quality for speed and only as an extra selection not as the default.
The only way live streaming would even remotely be viable here is by hardware acceleration. Software-bound encoding is a no-go. VP9 is even worse. I am using a FX-6300. It was not optimized for these tasks, to put it kindly. The people asking for this feature need this. They do not care about the fabled and scary quality loss.
I'd add that after Pascal on the Nvidia side, hardware encoding with their GPUs is leaps and bounds better, comparable to CPU based x264 up to Medium presets I believe, while being much faster.
I too would be very pleased about this feature. It does not have to be as user friendly as for example jellyfin with its hardware encoding support. It could be an advanced setting to add parameters for ffmpeg (playback/live transcoding or preview generation). If it causes problems, users could just set it back to default. But advanced users would be able to fiddle with it a litte more.
It's quite easy to pass vaapi support to docker containers, and hardware encoding would greatly benefit my high cpu loads.
Can someone share command-line which is being used when generating such video previews. Only thing I see at the moment is this when generation fails on WMVs sometimes.
"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\\Downloads\\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2 C:\\Users\\username\\.stash-data\\tmp\\preview013.mp4>: F:\\Downloads\\1.wmv: corrupt decoded frame in stream 1\r\n""
I would like to play around and at least see how GPU encode would help.
Took me 6 hours to generate video previews for 700 videos, each approximately 0,5-4GB. 20 pieces, 3% skip on both end, fast preset. i5 4460.
Can someone share command-line which is being used when generating such video previews. Only thing I see at the moment is this when generation fails on WMVs sometimes.
`"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\Downloads\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2
That is the command for generating a preview segment. In your case that's run 20 times for each video, with the result spliced together into the final preview. You can cut down on generation time by choosing fewer segments, and setting encoding preset to ultrafast.
We're investigating hardware acceleration for transcoding, but I have no idea if it's going to be useful for generation seeing as hw acceleration likely has more startup latency.
Just tried NVENC in handbrake app just to see difference on some random file. After 1 minute CPU encoded simple 220MB 720p WMV file only 1min30sec far. In comparison RTX2070S managed to encode entire 5 minutes video in these 1 minute restrictions.
Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?
Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?
Seconded.
Any news on this? This is how jellyfin handles gpu transcoding gui-wise
And this is the ffmpeg command
ffmpeg -vaapi_device /dev/dri/renderD128 -i file:"INPUT.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_vaapi -b:v 6621920 -maxrate 6621920 -bufsize 13243840 -force_key_frames:0 "expr:gte(t,0+n_forced*3)" -g 72 -keyint_min 72 -sc_threshold 0 -vf "format=nv12|vaapi,hwupload,scale_vaapi=w=1022:h=574:format=nv12" -start_at_zero -vsync -1 -codec:a:0 aac -ac 6 -ab 256000 -copyts -avoid_negative_ts disabled -f hls -max_delay 5000000 -hls_time 3 -individual_header_trailer 0 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "OUTPUT.ts" -hls_playlist_type vod -hls_list_size 0 -y "SOMEPLAYLISTIDONTKNOW.m3u8"
It is very performant and easy on the cpu
Jellyfin uses a different player so hls is supported, thats not the case for stash as jwplayers hls support depends on the browser afaik. This issue makes it more complicated to adapt to.
For generating previews I found that this really doesn't help much. Since previews are converted only 0.75 seconds at a time, the overhead of creating and concatenating (twelve 0.75 clips) is probably a lot more than generating these individual bursts. Here's what my GPU graph looked like - notice only very sparse spikes of usage (as opposed to continuous usage when converting larger files), even with 12 parallel tasks, while the CPU was still 100% all the time (doing the preparing, other processing). Overall it did not help much.
If anyone wants to test change "-c:v", "libx264"
to hevc_nvenc
here.
There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in https://github.com/stashapp/stash/issues/894#issuecomment-867616713):
Now on the plus side:
I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.
I think the problem is not that the software decoder is bad, but for instance, I have files that will ramp up my CPU cores to 100% interfering with other services that also need those cores (the very same thing is true when transcoding with Plex on software).
I have a pretty old CPU (4790K) and it has a lot of trouble playing some files because the CPU simply can't keep up. The GPU however is a pretty decent one (GTX1070) and has no problem doing multiple 4K hardware transcodes simultaneously without my CPU ramping up to a 100%.
I understand that this is probably too hard to implement (or that people don't see the benefits of it) and thus will probably never come to Stash, but I wish it did though. Yes ofc I can transcode by generating the files, but that takes up diskspace.
About 1/3 of my library is HEVC, in either 720p/1080p. The software transcoder starts to struggle if I try outputting to anything higher than 720p. I use Firefox on everything, which doesn't support HEVC for licensing reasons, so it's always transcoding and tying up the host CPU.
I experimented with building Stash on top of the nvidia/cuda docker stack and was able to achieve hardware accelerated decoding and encoding. I'm pretty impressed with the results. I let a 1080p HEVC video stream in H264 for about 5 minutes - CPU load stayed around 1.00 while FFMPEG quickly filled the buffer and throttled the GPU. I noticed the biggest difference when using both NVDEC and NVENC - just enabling one didn't seem to effect CPU usage much. I'm using a GTX 1650 with a Ryzen 5 3600.
I don't know Golang, my changes are pretty hacky and this isn't robust enough for a PR. But it works as a proof of concept and I'm sure someone wiser can implement this properly. I did notice unintended behavior when accessing stash over a reverse proxy + SSL. FFMPEG would peg the GPU at 100% then fail after about 3 minutes of playing a video. This is probably due to my own nginx misconfiguration, it did not occur when accessing Stash directly.
Here is my modified Dockerfile from docker/build/x86_64/Dockerfile
.
I changed the video codec in pkg/ffmpeg/codec.go
on line 14:
VideoCodecLibX264 VideoCodec = "h264_nvenc"
And the ffmpeg arguments for StreamFormatH264 in pkg/ffmpeg/stream.go
starting on line 68. The "+" in front of frag_keyframe was strictly necessary I found, but the rest I tuned according to preference because the default quality was quite poor.
StreamFormatH264 = StreamFormat{
codec: VideoCodecLibX264,
format: FormatMP4,
MimeType: MimeMp4,
extraArgs: []string{
"-acodec", "aac",
"-pix_fmt", "yuv420p",
"-movflags", "+frag_keyframe+empty_moov",
"-preset", "llhp",
"-rc", "vbr",
"-zerolatency", "1",
"-temporal-aq", "1",
"-cq", "24",
},
}
Running make docker-build
after this should produce a Stash container capable of GPU encoding. For decoding, I set -hwaccel auto
as a setting in the interface under "FFmpeg LiveTranscode Input Args". Setting it globally like this broke the other transcode formats where hardware accelerated decoding is not possible (like WebM, the default transcode target). I commented out the WebM scene routes and endpoints in internal/api/routes_scene.go
as a workaround, so it always falls back to MP4.
One of the obstacles mentioned by @willfe was the transcode limit imposed by the Nvidia drivers. I didn't try this because my host is already patched, but the transcode limit patch can be integrated into docker containers so the user doesn't have to bother with it.
I think the missing piece to a possible all-in-one Stash container for hardware transcoding is the logic to determine when to use it, which is tricky depending on the particular architecture of GPU the user has - even with the Nvidia CUDA tools.
Edit: Wow, preview generation is almost instantaneous.
Would a similar technique allow for QuickSync transcoding?
Would a similar technique allow for QuickSync transcoding?
AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128
.
There is an open PR https://github.com/stashapp/stash/pull/3419 btw if anyone is interested in testing or providing some feedback
Would a similar technique allow for QuickSync transcoding?
AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this
--device /dev/dri/render128
.
exactly what I was hoping you'd say
Edit: maybe getting ahead of myself, but this is a guide for exposing card with plex I used (shows commands to list available devices etc, and is synology specific but maybe works for others)
https://medium.com/@MrNick4B/plex-on-docker-on-synology-enabling-hardware-transcoding-fa017190cad7
I have an unusual Nas with a Rockchip RK3399 arm CPU. It does support hardware decoding with the H264_RKMPP HEVC_RKMPP decoders. I believe I need to compile ffmpeg myself to use these decoders which I have not bothered with yet.
Would it be possible to have a setting to specify the extra command line arguments for edge cases like this?
Great news, hardware encoding is now merged and ready for testing for anyone willing. It should work for:
make docker-cuda-build
, this makes the docker tag stash/cuda-build:latest
You will additionally need to specify the args:
--runtime=nvidia --gpus all --device /dev/nvidiactl --device /dev/nvidia0
CUDA
build and arg --device=/dev/dri
--device=/dev/dri
Note that RPI and VAAPI dont support direct file transcode for h264 (mp4), so it only uses h264 hardware transcoding for HLS (h264).
Note that the normal Docker build only supports VAAPI and v4l2m2m
.
You can check the logs for which codecs where found and enabled, and check the debug log for why they failed
Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to start looking here.
My docker-compose.yml already includes the device passthrough
devices:
- "/dev/dri/card0:/dev/dri/card0"
- "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?
Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?
On Fri, Mar 10, 2023, 07:12 Max @.***> wrote:
Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to start looking here.
My docker-compose.yml already includes the device passthrough
devices: - "/dev/dri/card0:/dev/dri/card0" - "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?
— Reply to this email directly, view it on GitHub https://github.com/stashapp/stash/issues/305#issuecomment-1463718748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHROW435QUVDQVEQEIKKSDW3MLCLANCNFSM4KCN4KXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to start looking here.
My docker-compose.yml already includes the device passthrough
devices: - "/dev/dri/card0:/dev/dri/card0" - "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?
When stash starts, go into the webui->settings->logs
and set the log level to debug
, find the entry with codec h264_qsv
and send the specific error
Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?
No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.
When stash starts, go into the
webui->settings->logs
and set thelog level to debug
, find the entry with codech264_qsv
and send the specific error
Switching to debug or even trace shows nothing more from the server startup.
When starting stash the only hint for HWacc is [InitHWSupport] Supported HW codecs:
, when I try to live transcode it just works but as slow as it did with CPU only and the logs show nothing related to HWacc (tried with HLS, webm, dash all running slow apparently without HWacc)
2023-03-10 13:30:03
Debug
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:03
Debug
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:02
Debug
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:29:53
Debug
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:28:30
Debug
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_hls at segment #0
2023-03-10 13:28:29
Debug
[transcode] returning HLS manifest for scene 4711
2023-03-10 13:28:10
Debug
[transcode] streaming scene 4711 as video/webm
2023-03-10 13:28:08
Debug
[transcode] streaming scene 4711 as video/webm
Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?
No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.
Afaik unraid doesn't have drivers for qsv by default? I'd been looking into it before putting my Plex install into a container and moving it over and came across this guide. Figured I'd probably have to do the same thing for this container, no? I've been asleep most of the time this release has been out so I haven't had a chance to try it with Stash.
https://forums.unraid.net/topic/77943-guide-plex-hardware-acceleration-using-intel-quick-sync/ sorry that's the original hard way to do it
Just installed both GPU Statistics
and Intel-GPU-Top
apps for UnRAID and it said in the installation log Intel Kernel Module already enabled
, so I guess it already has drivers etc
The other part from the tutorial there is already done with my shared docker-compose.yml config where I passthrough the devices.
Still same results for stash after startup in the log
I use Unraid, had already the 2 plugins intalled : Intel GPU top and GPU statistics . I set the devices : `devices:
But only my CPU is used :/ I don't have dev skills but I can test things if someone tells me to !
qsv
is not loaded for me with device passthrough in docker compose which works on jellyfin:
devices:
- /dev/dri:/dev/dri
Here is the stash log:
stash is running at http://localhost:9999/
2023-03-12 13:11:35
Info
stash is listening on 0.0.0.0:9999
2023-03-12 13:11:35
Info
stash version: v0.19.1-56-g9aa7ec57 - Official Build - 2023-03-10 22:31:13
2023-03-12 13:11:35
Info
[InitHWSupport] Supported HW codecs:
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec vp9_vaapi not supported. Error output:
[AVHWDeviceContext @ 0x7fb07a26de00] Failed to initialise VAAPI connection: -1 (unknown libva error).
Device creation failed: -5.
Failed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error
Error parsing global options: I/O error
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec vp9_qsv not supported. Error output:
Device creation failed: -12.
Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory
Error parsing global options: Out of memory
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec h264_v4l2m2m not supported. Error output:
[h264_v4l2m2m @ 0x7efe12e10880] Could not find a valid device
[h264_v4l2m2m @ 0x7efe12e10880] can't configure encoder
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec h264_vaapi not supported. Error output:
[AVHWDeviceContext @ 0x7fd6cc337e00] Failed to initialise VAAPI connection: -1 (unknown libva error).
Device creation failed: -5.
Failed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error
Error parsing global options: I/O error
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec h264_qsv not supported. Error output:
Device creation failed: -12.
Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory
Error parsing global options: Out of memory
2023-03-12 13:11:35
Debug
[InitHWSupport] Codec h264_nvenc not supported. Error output:
Unrecognized option 'rc'.
Error splitting the argument list: Option not found
2023-03-12 13:11:34
Debug
Reading scraper configs from /root/.stash/scrapers
2023-03-12 13:11:34
Debug
Reading plugin configs from /root/.stash/plugins
2023-03-12 13:11:34
Info
using config file: /root/.stash/config.yml
Running ffmpeg -encoders
on host outputs h264_qsv
whereas within the shell in stash docker does not.
2023-03-12 13:11:35 Debug
[InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory
Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg.
You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs.
Another thing is that it says Out of memory
, so i dont have high hopes switching to Ubuntu will work any better.
Binhex uses arch as their base image for Plex
2023-03-12 13:11:35 Debug [InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory
Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg. You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs. Another thing is that it says
Out of memory
, so i dont have high hopes switching to Ubuntu will work any better.
Any idea what Out of memory
means here? bc I have at least 8gb ram free
2023-03-12 13:11:35 Debug [InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory
Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg. You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs. Another thing is that it says
Out of memory
, so i dont have high hopes switching to Ubuntu will work any better.Any idea what
Out of memory
means here? bc I have at least 8gb ram free
The technical command is ffmpeg -init_hw_device qsv=hw -filter_hw_device hw -f lavfi -i color=c=red -t 0.1 -c:v h264_qsv -global_quality 20 -preset faster -vf hwupload=extra_hw_frames=64,format=qsv,scale_qsv=-1:160 -f null -
, which apparently fails for you.
I would suspect that you have an artificial limit on the docker container's memory, you can check with docker stats --no-stream
.
Just found out that I had an old config and was therefore not seeing debug messages in my log. After enabling it, I now see the same error message with out of memory:
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_nvenc not supported. Error output:\nUnrecognized option 'rc'.\nError splitting the argument list: Option not found\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_qsv not supported. Error output:\nDevice creation failed: -12.\nFailed to set value 'qsv=hw' for option 'init_hw_device': Out of memory\nError parsing global options: Out of memory\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_vaapi not supported. Error output:\n[AVHWDeviceContext @ 0x147c11ae7e00] Failed to initialise VAAPI connection: -1 (unknown libva error).\nDevice creation failed: -5.\nFailed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error\nError parsing global options: I/O error\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_v4l2m2m not supported. Error output:\n[h264_v4l2m2m @ 0x14eea5bff940] Could not find a valid device\n[h264_v4l2m2m @ 0x14eea5bff940] can't configure encoder\nError initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec vp9_qsv not supported. Error output:\nDevice creation failed: -12.\nFailed to set value 'qsv=hw' for option 'init_hw_device': Out of memory\nError parsing global options: Out of memory\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec vp9_vaapi not supported. Error output:\n[AVHWDeviceContext @ 0x14b4e4e92e00] Failed to initialise VAAPI connection: -1 (unknown libva error).\nDevice creation failed: -5.\nFailed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error\nError parsing global options: I/O error\n"
time="2023-03-13 19:36:38" level=info msg="[InitHWSupport] Supported HW codecs:\n"
Docker stats returns
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
4cf9081a4a11 stash 0.16% 9.988MiB / 7.627GiB 0.13% 20.6kB / 3.66kB 0B / 0B 10
I made sure that the /dev/dri folder has all rights, using latest development docker release
Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)
Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)
My docker-compose.yml has both entries configured as devices:
devices:
- "/dev/dri/card0:/dev/dri/card0"
- "/dev/dri/renderD128:/dev/dri/renderD128"
Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.
services:
stash:
image: stashapp/stash:development
// [...]
mem_limit: 2048m
mem_reservation: 1024M
Still Out of memory
in all cases
Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)
My docker-compose.yml has both entries configured as devices:
devices: - "/dev/dri/card0:/dev/dri/card0" - "/dev/dri/renderD128:/dev/dri/renderD128"
Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.
services: stash: image: stashapp/stash:development // [...] mem_limit: 2048m mem_reservation: 1024M
Still
Out of memory
in all cases
Could you try modifying the docker build to add:
For alpine build: RUN apk add --no-cache mesa-dri-gallium libva-intel-driver
For cuda build: RUN apt install intel-media-va-driver-non-free -y
Below the other apk add or apt install's.
I tried the image by CarlNs92891 (who deleted their message or got it deleted, idk) which does
apt install libvips-tools ffmpeg musl
apt install intel-media-va-driver-non-free vainfo
and with that it works!
Can someone from the maintainers tell whats the current blocker here?
I really would like to have this running in the official docker images, so I can use watchtower auto updates for my containers, therefore self building with these tricks is not a good option for me.
How about having the CUDA image also auto released to docker hub as stashapp/stash:CUDA-latest or similar?
QSV works I think?
QSV works I think?
I'm not sure anymore what kind of hardware encoding works on my NAS, but apparently it's not QSV.
I still get [InitHWSupport] Supported HW codecs:
in my logs with the default latest docker image.
With the CUDA image it works, but it's not published on the docker hub, which is my main issue currently
QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.
I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.
As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?
There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):
* Hardware encoders are pickier about input formats, color spaces, etc. * ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding. * Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed. * Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it. * Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).
Now on the plus side:
* Hardware _decoding_ could potentially speed things up during hardware encoding _if_: * the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and * the entire job can be done in a single invocation of _ffmpeg_ (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding _stash_ currently invokes _ffmpeg_ multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get _ffmpeg_ to be more efficient about this.
I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.
Nvidia limited you to 3 encodes, not 2. And they recently changed it to 5.
QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.
I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.
As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?
Mind sharing the build file?
@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg
Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.
@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg
Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.
This works perfectly. Any chance of it being updated to match the newest release :D
When I pull this image https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg instead of my installed nightly version it crashes (probably the two are not swapable because of the date difference). I hope I'll be able soon to use HW transcode within my "classic" stash install.
QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.
I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.
As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?
@nerethos Can this version use vaapi to transcoding with old iGPU?
I got this working with iGPU on the current docker image release.
apk add libva-intel-driver
-hwaccel
and auto
Hope this helps someone
I got this working with iGPU on the current docker image release.
i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel
I got this working with iGPU on the current docker image release.
i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel
I got this working on Unraid as well but with a much older CPU (ivy bridge). Generation tasks still don’t use hardware acceleration, just transcoding tasks. I’m not sure Stash supports this as I didn’t find any config options related to it..
i would love for nvidia NVenc for tarnscoding and Generation this would also work with amd and intel encoders this could speed up the Generation process