[Feature] hardware encoding

iamjen023 commented 4 years ago

i would love for nvidia NVenc for tarnscoding and Generation this would also work with amd and intel encoders this could speed up the Generation process

bnkai commented 4 years ago

For the file transcoding we use x264 with the faster preset and thats probably the only place it might be quicker with nvenc (not sure about quality) BUT file transcoding imho is not that needed anymore since we now have live stream transcoding for unsupported files.( IMHO transcodes in the generated content section should be left unticked in 99% of the cases )

For live transcoding we have vp9 webm files that are not supported through hardware encoding except through intel vaapi i think and not sure about the stability / quality of that also.

Finally for the generated previews and markers we use x264 with preset veryslow to get the highest quality , since they are only generated once but viewed many times. If you wanted to make the generate faster thats where maybe we could opt to change the veryslow preset to medium or even fast and still get better quality/performance than hardware encoders. Thats ofcourse only for anyone that is willing to compromise the quality for speed and only as an extra selection not as the default.

HASJ commented 4 years ago

The only way live streaming would even remotely be viable here is by hardware acceleration. Software-bound encoding is a no-go. VP9 is even worse. I am using a FX-6300. It was not optimized for these tasks, to put it kindly. The people asking for this feature need this. They do not care about the fabled and scary quality loss.

CenterThrowaway commented 4 years ago

I'd add that after Pascal on the Nvidia side, hardware encoding with their GPUs is leaps and bounds better, comparable to CPU based x264 up to Medium presets I believe, while being much faster.

praul commented 3 years ago

I too would be very pleased about this feature. It does not have to be as user friendly as for example jellyfin with its hardware encoding support. It could be an advanced setting to add parameters for ffmpeg (playback/live transcoding or preview generation). If it causes problems, users could just set it back to default. But advanced users would be able to fiddle with it a litte more.

It's quite easy to pass vaapi support to docker containers, and hardware encoding would greatly benefit my high cpu loads.

r3538987 commented 3 years ago

Can someone share command-line which is being used when generating such video previews. Only thing I see at the moment is this when generation fails on WMVs sometimes.

"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\\Downloads\\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2 C:\\Users\\username\\.stash-data\\tmp\\preview013.mp4>: F:\\Downloads\\1.wmv: corrupt decoded frame in stream 1\r\n"" I would like to play around and at least see how GPU encode would help.

Took me 6 hours to generate video previews for 700 videos, each approximately 0,5-4GB. 20 pieces, 3% skip on both end, fast preset. i5 4460.

ghost commented 3 years ago

Can someone share command-line which is being used when generating such video previews. Only thing I see at the moment is this when generation fails on WMVs sometimes.

`"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\Downloads\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2

That is the command for generating a preview segment. In your case that's run 20 times for each video, with the result spliced together into the final preview. You can cut down on generation time by choosing fewer segments, and setting encoding preset to ultrafast.

We're investigating hardware acceleration for transcoding, but I have no idea if it's going to be useful for generation seeing as hw acceleration likely has more startup latency.

r3538987 commented 3 years ago

Just tried NVENC in handbrake app just to see difference on some random file. After 1 minute CPU encoded simple 220MB 720p WMV file only 1min30sec far. In comparison RTX2070S managed to encode entire 5 minutes video in these 1 minute restrictions.

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

HASJ commented 3 years ago

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

Seconded.

praul commented 3 years ago

Any news on this? This is how jellyfin handles gpu transcoding gui-wise

And this is the ffmpeg command ffmpeg -vaapi_device /dev/dri/renderD128 -i file:"INPUT.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_vaapi -b:v 6621920 -maxrate 6621920 -bufsize 13243840 -force_key_frames:0 "expr:gte(t,0+n_forced*3)" -g 72 -keyint_min 72 -sc_threshold 0 -vf "format=nv12|vaapi,hwupload,scale_vaapi=w=1022:h=574:format=nv12" -start_at_zero -vsync -1 -codec:a:0 aac -ac 6 -ab 256000 -copyts -avoid_negative_ts disabled -f hls -max_delay 5000000 -hls_time 3 -individual_header_trailer 0 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "OUTPUT.ts" -hls_playlist_type vod -hls_list_size 0 -y "SOMEPLAYLISTIDONTKNOW.m3u8"

It is very performant and easy on the cpu

bnkai commented 3 years ago

Jellyfin uses a different player so hls is supported, thats not the case for stash as jwplayers hls support depends on the browser afaik. This issue makes it more complicated to adapt to.

reduych commented 3 years ago

For generating previews I found that this really doesn't help much. Since previews are converted only 0.75 seconds at a time, the overhead of creating and concatenating (twelve 0.75 clips) is probably a lot more than generating these individual bursts. Here's what my GPU graph looked like - notice only very sparse spikes of usage (as opposed to continuous usage when converting larger files), even with 12 parallel tasks, while the CPU was still 100% all the time (doing the preparing, other processing). Overall it did not help much.

If anyone wants to test change "-c:v", "libx264" to hevc_nvenc here.

willfe commented 3 years ago

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in https://github.com/stashapp/stash/issues/894#issuecomment-867616713):

Hardware encoders are pickier about input formats, color spaces, etc.
- ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
- Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.
Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.
Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

Hardware decoding could potentially speed things up during hardware encoding if:
- the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
- the entire job can be done in a single invocation of ffmpeg (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding stash currently invokes ffmpeg multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get ffmpeg to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

jimz011 commented 1 year ago

I think the problem is not that the software decoder is bad, but for instance, I have files that will ramp up my CPU cores to 100% interfering with other services that also need those cores (the very same thing is true when transcoding with Plex on software).

I have a pretty old CPU (4790K) and it has a lot of trouble playing some files because the CPU simply can't keep up. The GPU however is a pretty decent one (GTX1070) and has no problem doing multiple 4K hardware transcodes simultaneously without my CPU ramping up to a 100%.

I understand that this is probably too hard to implement (or that people don't see the benefits of it) and thus will probably never come to Stash, but I wish it did though. Yes ofc I can transcode by generating the files, but that takes up diskspace.

notme43 commented 1 year ago

About 1/3 of my library is HEVC, in either 720p/1080p. The software transcoder starts to struggle if I try outputting to anything higher than 720p. I use Firefox on everything, which doesn't support HEVC for licensing reasons, so it's always transcoding and tying up the host CPU.

I experimented with building Stash on top of the nvidia/cuda docker stack and was able to achieve hardware accelerated decoding and encoding. I'm pretty impressed with the results. I let a 1080p HEVC video stream in H264 for about 5 minutes - CPU load stayed around 1.00 while FFMPEG quickly filled the buffer and throttled the GPU. I noticed the biggest difference when using both NVDEC and NVENC - just enabling one didn't seem to effect CPU usage much. I'm using a GTX 1650 with a Ryzen 5 3600.

I don't know Golang, my changes are pretty hacky and this isn't robust enough for a PR. But it works as a proof of concept and I'm sure someone wiser can implement this properly. I did notice unintended behavior when accessing stash over a reverse proxy + SSL. FFMPEG would peg the GPU at 100% then fail after about 3 minutes of playing a video. This is probably due to my own nginx misconfiguration, it did not occur when accessing Stash directly.

Here is my modified Dockerfile from docker/build/x86_64/Dockerfile.

I changed the video codec in pkg/ffmpeg/codec.go on line 14: VideoCodecLibX264 VideoCodec = "h264_nvenc"

And the ffmpeg arguments for StreamFormatH264 in pkg/ffmpeg/stream.go starting on line 68. The "+" in front of frag_keyframe was strictly necessary I found, but the rest I tuned according to preference because the default quality was quite poor.

StreamFormatH264 = StreamFormat{
        codec:    VideoCodecLibX264,
        format:   FormatMP4,
        MimeType: MimeMp4,
        extraArgs: []string{
                "-acodec", "aac",
                "-pix_fmt", "yuv420p",
                "-movflags", "+frag_keyframe+empty_moov",
                "-preset", "llhp",
                "-rc", "vbr",
                "-zerolatency", "1",
                "-temporal-aq", "1",
                "-cq", "24",
        },
}

Running make docker-build after this should produce a Stash container capable of GPU encoding. For decoding, I set -hwaccel auto as a setting in the interface under "FFmpeg LiveTranscode Input Args". Setting it globally like this broke the other transcode formats where hardware accelerated decoding is not possible (like WebM, the default transcode target). I commented out the WebM scene routes and endpoints in internal/api/routes_scene.go as a workaround, so it always falls back to MP4.

One of the obstacles mentioned by @willfe was the transcode limit imposed by the Nvidia drivers. I didn't try this because my host is already patched, but the transcode limit patch can be integrated into docker containers so the user doesn't have to bother with it.

I think the missing piece to a possible all-in-one Stash container for hardware transcoding is the logic to determine when to use it, which is tricky depending on the particular architecture of GPU the user has - even with the Nvidia CUDA tools.

Edit: Wow, preview generation is almost instantaneous.

i-am-at0m commented 1 year ago

Would a similar technique allow for QuickSync transcoding?

notme43 commented 1 year ago

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

bnkai commented 1 year ago

There is an open PR https://github.com/stashapp/stash/pull/3419 btw if anyone is interested in testing or providing some feedback

electblake commented 1 year ago

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

exactly what I was hoping you'd say

Edit: maybe getting ahead of myself, but this is a guide for exposing card with plex I used (shows commands to list available devices etc, and is synology specific but maybe works for others)

https://medium.com/@MrNick4B/plex-on-docker-on-synology-enabling-hardware-transcoding-fa017190cad7

Tweeticoats commented 1 year ago

I have an unusual Nas with a Rockchip RK3399 arm CPU. It does support hardware decoding with the H264_RKMPP HEVC_RKMPP decoders. I believe I need to compile ffmpeg myself to use these decoders which I have not bothered with yet.

Would it be possible to have a setting to specify the extra command line arguments for edge cases like this?

NodudeWasTaken commented 1 year ago

Great news, hardware encoding is now merged and ready for testing for anyone willing. It should work for:

NVIDIA GPUS (h264_nvenc) The docker image can be built with make docker-cuda-build, this makes the docker tag stash/cuda-build:latest You will additionally need to specify the args: --runtime=nvidia --gpus all --device /dev/nvidiactl --device /dev/nvidia0
Intel (h264_qsv, vp9_qsv) For docker you must use the CUDA build and arg --device=/dev/dri
Raspberry pi (newer) (h264_v4l2m2m)
AMD Linux and most VAAPI supported platforms (h264_vaapi, vp9_vaapi) (hopefully) For docker you must use the arg --device=/dev/dri

Note that RPI and VAAPI dont support direct file transcode for h264 (mp4), so it only uses h264 hardware transcoding for HLS (h264).

Note that the normal Docker build only supports VAAPI and v4l2m2m.

You can check the logs for which codecs where found and enabled, and check the debug log for why they failed

derN3rd commented 1 year ago

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.

23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:

Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Any idea/tips how to get more information for this?

i-am-at0m commented 1 year ago

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

On Fri, Mar 10, 2023, 07:12 Max @.***> wrote:

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.

23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs:

Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough
devices:
  - "/dev/dri/card0:/dev/dri/card0"
  - "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?

— Reply to this email directly, view it on GitHub https://github.com/stashapp/stash/issues/305#issuecomment-1463718748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHROW435QUVDQVEQEIKKSDW3MLCLANCNFSM4KCN4KXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

NodudeWasTaken commented 1 year ago

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough
    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

derN3rd commented 1 year ago

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

Switching to debug or even trace shows nothing more from the server startup. When starting stash the only hint for HWacc is [InitHWSupport] Supported HW codecs:, when I try to live transcode it just works but as slow as it did with CPU only and the logs show nothing related to HWacc (tried with HLS, webm, dash all running slow apparently without HWacc)

2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:29:53
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:28:30
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_hls at segment #0
2023-03-10 13:28:29
Debug   
[transcode] returning HLS manifest for scene 4711
2023-03-10 13:28:10
Debug   
[transcode] streaming scene 4711 as video/webm
2023-03-10 13:28:08
Debug   
[transcode] streaming scene 4711 as video/webm

i-am-at0m commented 1 year ago

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.

Afaik unraid doesn't have drivers for qsv by default? I'd been looking into it before putting my Plex install into a container and moving it over and came across this guide. Figured I'd probably have to do the same thing for this container, no? I've been asleep most of the time this release has been out so I haven't had a chance to try it with Stash.

~~https://forums.unraid.net/topic/77943-guide-plex-hardware-acceleration-using-intel-quick-sync/~~ sorry that's the original hard way to do it

https://forums.unraid.net/topic/131548-add-intel-igpu-qsv-quick-sync-encoding-to-official-plex-media-server-the-easy-way/

derN3rd commented 1 year ago

Just installed both GPU Statistics and Intel-GPU-Top apps for UnRAID and it said in the installation log Intel Kernel Module already enabled, so I guess it already has drivers etc The other part from the tutorial there is already done with my shared docker-compose.yml config where I passthrough the devices.

Still same results for stash after startup in the log

guim31 commented 1 year ago

I use Unraid, had already the 2 plugins intalled : Intel GPU top and GPU statistics . I set the devices : `devices:

"/dev/dri/card0:/dev/dri/card0"
"/dev/dri/renderD128:/dev/dri/renderD128"`

But only my CPU is used :/ I don't have dev skills but I can test things if someone tells me to !

CarlNs92891 commented 1 year ago

qsv is not loaded for me with device passthrough in docker compose which works on jellyfin:

devices:
      - /dev/dri:/dev/dri

Here is the stash log:

stash is running at http://localhost:9999/
2023-03-12 13:11:35
Info    
stash is listening on 0.0.0.0:9999
2023-03-12 13:11:35
Info    
stash version: v0.19.1-56-g9aa7ec57 - Official Build - 2023-03-10 22:31:13
2023-03-12 13:11:35
Info    
[InitHWSupport] Supported HW codecs:
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec vp9_vaapi not supported. Error output:
[AVHWDeviceContext @ 0x7fb07a26de00] Failed to initialise VAAPI connection: -1 (unknown libva error).
Device creation failed: -5.
Failed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error
Error parsing global options: I/O error
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec vp9_qsv not supported. Error output:
Device creation failed: -12.
Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory
Error parsing global options: Out of memory
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec h264_v4l2m2m not supported. Error output:
[h264_v4l2m2m @ 0x7efe12e10880] Could not find a valid device
[h264_v4l2m2m @ 0x7efe12e10880] can't configure encoder
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec h264_vaapi not supported. Error output:
[AVHWDeviceContext @ 0x7fd6cc337e00] Failed to initialise VAAPI connection: -1 (unknown libva error).
Device creation failed: -5.
Failed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error
Error parsing global options: I/O error
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec h264_qsv not supported. Error output:
Device creation failed: -12.
Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory
Error parsing global options: Out of memory
2023-03-12 13:11:35
Debug   
[InitHWSupport] Codec h264_nvenc not supported. Error output:
Unrecognized option 'rc'.
Error splitting the argument list: Option not found
2023-03-12 13:11:34
Debug   
Reading scraper configs from /root/.stash/scrapers
2023-03-12 13:11:34
Debug   
Reading plugin configs from /root/.stash/plugins
2023-03-12 13:11:34
Info    
using config file: /root/.stash/config.yml

Running ffmpeg -encoders on host outputs h264_qsv whereas within the shell in stash docker does not.

NodudeWasTaken commented 1 year ago

2023-03-12 13:11:35 Debug
[InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory

Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg. You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs. Another thing is that it says Out of memory, so i dont have high hopes switching to Ubuntu will work any better.

i-am-at0m commented 1 year ago

Binhex uses arch as their base image for Plex

CarlNs92891 commented 1 year ago

2023-03-12 13:11:35 Debug [InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory

Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg. You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs. Another thing is that it says Out of memory, so i dont have high hopes switching to Ubuntu will work any better.

Any idea what Out of memory means here? bc I have at least 8gb ram free

NodudeWasTaken commented 1 year ago

2023-03-12 13:11:35 Debug [InitHWSupport] Codec h264_qsv not supported. Error output: Device creation failed: -12. Failed to set value 'qsv=hw' for option 'init_hw_device': Out of memory Error parsing global options: Out of memory

Multiple things here, firstly i checked alpine linux (what stash docker is built on), they dont compile any hardware codecs for ffmpeg. You should try using the CUDA build which is built on Ubuntu and should have most hardware codecs. Another thing is that it says Out of memory, so i dont have high hopes switching to Ubuntu will work any better.

Any idea what Out of memory means here? bc I have at least 8gb ram free

The technical command is ffmpeg -init_hw_device qsv=hw -filter_hw_device hw -f lavfi -i color=c=red -t 0.1 -c:v h264_qsv -global_quality 20 -preset faster -vf hwupload=extra_hw_frames=64,format=qsv,scale_qsv=-1:160 -f null -, which apparently fails for you. I would suspect that you have an artificial limit on the docker container's memory, you can check with docker stats --no-stream.

derN3rd commented 1 year ago

Just found out that I had an old config and was therefore not seeing debug messages in my log. After enabling it, I now see the same error message with out of memory:

time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_nvenc not supported. Error output:\nUnrecognized option 'rc'.\nError splitting the argument list: Option not found\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_qsv not supported. Error output:\nDevice creation failed: -12.\nFailed to set value 'qsv=hw' for option 'init_hw_device': Out of memory\nError parsing global options: Out of memory\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_vaapi not supported. Error output:\n[AVHWDeviceContext @ 0x147c11ae7e00] Failed to initialise VAAPI connection: -1 (unknown libva error).\nDevice creation failed: -5.\nFailed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error\nError parsing global options: I/O error\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec h264_v4l2m2m not supported. Error output:\n[h264_v4l2m2m @ 0x14eea5bff940] Could not find a valid device\n[h264_v4l2m2m @ 0x14eea5bff940] can't configure encoder\nError initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec vp9_qsv not supported. Error output:\nDevice creation failed: -12.\nFailed to set value 'qsv=hw' for option 'init_hw_device': Out of memory\nError parsing global options: Out of memory\n"
time="2023-03-13 19:36:38" level=debug msg="[InitHWSupport] Codec vp9_vaapi not supported. Error output:\n[AVHWDeviceContext @ 0x14b4e4e92e00] Failed to initialise VAAPI connection: -1 (unknown libva error).\nDevice creation failed: -5.\nFailed to set value '/dev/dri/renderD128' for option 'vaapi_device': I/O error\nError parsing global options: I/O error\n"
time="2023-03-13 19:36:38" level=info msg="[InitHWSupport] Supported HW codecs:\n"

Docker stats returns

CONTAINER ID   NAME                      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
4cf9081a4a11   stash                     0.16%     9.988MiB / 7.627GiB   0.13%     20.6kB / 3.66kB   0B / 0B     10

I made sure that the /dev/dri folder has all rights, using latest development docker release

i-am-at0m commented 1 year ago

Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)

derN3rd commented 1 year ago

Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)

My docker-compose.yml has both entries configured as devices:

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.

services:
  stash:
    image: stashapp/stash:development
    // [...]
    mem_limit: 2048m
    mem_reservation: 1024M

Still Out of memory in all cases

NodudeWasTaken commented 1 year ago

Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)

My docker-compose.yml has both entries configured as devices:
    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"
Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.
services:
  stash:
    image: stashapp/stash:development
    // [...]
    mem_limit: 2048m
    mem_reservation: 1024M
Still Out of memory in all cases

Could you try modifying the docker build to add: For alpine build: RUN apk add --no-cache mesa-dri-gallium libva-intel-driver For cuda build: RUN apt install intel-media-va-driver-non-free -y Below the other apk add or apt install's.

derN3rd commented 1 year ago

I tried the image by CarlNs92891 (who deleted their message or got it deleted, idk) which does

apt install libvips-tools ffmpeg musl 
apt install intel-media-va-driver-non-free vainfo

and with that it works!

derN3rd commented 1 year ago

Can someone from the maintainers tell whats the current blocker here?

I really would like to have this running in the official docker images, so I can use watchtower auto updates for my containers, therefore self building with these tricks is not a good option for me.

How about having the CUDA image also auto released to docker hub as stashapp/stash:CUDA-latest or similar?

i-am-at0m commented 1 year ago

QSV works I think?

derN3rd commented 1 year ago

QSV works I think?

I'm not sure anymore what kind of hardware encoding works on my NAS, but apparently it's not QSV. I still get [InitHWSupport] Supported HW codecs: in my logs with the default latest docker image.

With the CUDA image it works, but it's not published on the docker hub, which is my main issue currently

nerethos commented 1 year ago

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

FoodFighters commented 1 year ago

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):

* Hardware encoders are pickier about input formats, color spaces, etc.

  * ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
  * Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.

* Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.

* Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

* Hardware _decoding_ could potentially speed things up during hardware encoding _if_:

  * the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
  * the entire job can be done in a single invocation of _ffmpeg_ (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding _stash_ currently invokes _ffmpeg_ multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get _ffmpeg_ to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

Nvidia limited you to 3 encodes, not 2. And they recently changed it to 5.

algers commented 1 year ago

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

Mind sharing the build file?

anonstash commented 1 year ago

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

wormvortex commented 1 year ago

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

This works perfectly. Any chance of it being updated to match the newest release :D

guim31 commented 1 year ago

When I pull this image https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg instead of my installed nightly version it crashes (probably the two are not swapable because of the date difference). I hope I'll be able soon to use HW transcode within my "classic" stash install.

JeremyTsai26 commented 1 year ago

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

@nerethos Can this version use vaapi to transcoding with old iGPU?

Casper889 commented 11 months ago

I got this working with iGPU on the current docker image release.

Pass through iGPU. /dev/dri/card0, /dev/dri/renderD128 in my case
install driver in docker image apk add libva-intel-driver
in Stash System Settings pass these arguments to ffmpeg: -hwaccel and auto
in Stash System Settings FFmpeg hardware encoding turned on

Hope this helps someone

razgriz88 commented 11 months ago

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel

Casper889 commented 11 months ago

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel

I got this working on Unraid as well but with a much older CPU (ivy bridge). Generation tasks still don’t use hardware acceleration, just transcoding tasks. I’m not sure Stash supports this as I didn’t find any config options related to it..

stashapp / stash

[Feature] hardware encoding #305