[jellyfin] nvenc encoding fails: unable to load libnvcuvid.so.1

Alex-Orsholits commented 2 years ago

App Name

jellyfin

SCALE Version

22.02.0

App Version

10.7.7_9.0.43

Application Events

2022-03-01 4:32:07
Started container gaofin-jellyfin
2022-03-01 4:32:06
Created container gaofin-jellyfin
2022-03-01 4:32:02
Container image "tccr.io/truecharts/jellyfin:v10.7.7@sha256:0136db4677a2ee2ee8a6962d813d6e3b49aa86784a7cfdc3af76427db32c3470" already present on machine
2022-03-01 4:32:01
Started container inotify
2022-03-01 4:32:00
Created container inotify
2022-03-01 4:31:57
Container image "ghcr.io/truecharts/alpine:v3.14.2@sha256:4095394abbae907e94b1f2fd2e2de6c4f201a5b9704573243ca8eb16db8cdb7c" already present on machine
2022-03-01 4:30:23
Started container autopermissions
2022-03-01 4:30:22
Created container autopermissions
2022-03-01 4:30:19
Container image "ghcr.io/truecharts/alpine:v3.14.2@sha256:4095394abbae907e94b1f2fd2e2de6c4f201a5b9704573243ca8eb16db8cdb7c" already present on machine
2022-03-01 4:30:19
Add eth0 [172.16.0.78/16] from ix-net
Successfully assigned ix-gaofin/gaofin-jellyfin-664bdc6cff-g5d9b to ix-truenas
2022-03-01 4:30:14
Created pod: gaofin-jellyfin-664bdc6cff-g5d9b

Application Logs

2022-02-28 19:45:40.332019+00:00[04:45:40] [INF] [5] Jellyfin.Api.Helpers.TranscodingJobHelper: /usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel cuda -hwaccel_output_format cuda -extra_hw_frames 3 -autorotate 0 -i file:"/gaochan/TV & Movies/*videofile*.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -preset default -b:v 320436 -maxrate 320436 -bufsize 640872 -profile:v:0 high -g:v:0 75 -keyint_min:v:0 75 -sc_threshold:v:0 0 -vf "scale_cuda=w=426:h=238:format=nv12" -start_at_zero -vsync -1 -codec:a:0 copy -strict -2 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "/config/transcodes/19c5e8171ef17b546e613adbf4202233%d.ts" -hls_playlist_type vod -hls_list_size 0 -y "/config/transcodes/19c5e8171ef17b546e613adbf4202233.m3u8"
2022-02-28 19:45:40.780877+00:00[04:45:40] [ERR] [22] Jellyfin.Api.Helpers.TranscodingJobHelper: FFmpeg exited with code 1
2022-02-28 19:45:40.847353+00:00[04:45:40] [WRN] [22] Jellyfin.Api.Controllers.DynamicHlsController: cannot serve /config/transcodes/19c5e8171ef17b546e613adbf42022330.ts as transcoding quit before we got there
2022-02-28 19:45:40.849046+00:00[04:45:40] [ERR] [22] Jellyfin.Server.Middleware.ExceptionMiddleware: Error processing request: Could not find file '/config/transcodes/19c5e8171ef17b546e613adbf42022330.ts'. URL GET /videos/e8b08115-68a9-72c6-1ddd-a04cf0c960f3/hls1/main/0.ts.

Application Configuration

I launched the jellyfin docker with a mostly stock configuration. Below are the only settings I changed (apart from adding additional app storage)

Custom Resource Limits

CPU: 10000m
RAM: 8Gi
GPU Configuration: Allocate 1 nvidia.com/gpu GPU

Describe the bug

When attemping to transcode content using the nvenc encoder, FFMpeg exists with error code 1. The jellyfin logs do not specify the actual reason for failure, but running the logged command directly in the pod shell provides the error: [h264 @ 0x55783247bf40] Cannot load libnvcuvid.so.1 [h264 @ 0x55783247bf40] Failed loading nvcuvid. [h264 @ 0x55783247bf40] Failed setup for format cuda: hwaccel initialisation returned error. Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0' Error reinitializing filters! Failed to inject frame into filter network: Function not implemented Error while processing the decoded data for stream #0:0 Conversion failed!

I verified that the graphics card is passed to the pod and is available using the nvidia-smi in the pod shell:

To Reproduce

Install the jellyfin docker with GPU allocated. (In my case I am using a GTX1070)
enable hardware (nvenc) encoding in the jellyfin settings
attempt to transcode any video with nvenc

Expected Behavior

FFMpeg returns success status and provides the user with a transcoded stream

Screenshots

Additional Context

My truenas SCALE hardware is as follows:

CPU: Intel Xeon e5 2678 v3 x2
RAM: Samsung 128GB DDR4 ECC RAM
Mobo: Supermicro X10DRi (includes ASPEED VGA adaptor as primary video out)
GPU: Gainward GTX 1070 Phantom

The GPU is not isolated from the host OS and correctly shows up in both TrueNAS and Jellyfin container.

I've read and agree with the following

[X] I've checked all open and closed issues and my issue is not there.

PrivatePuffin commented 2 years ago

We don't build the container in this case, there is nothing we can do about this. :(

meh301 commented 2 years ago

Ah, I was afraid of this... Just one question: on container launch, how does jellyfin get the GPU capabilities? It seems that it is possible to explicitly state the required Nvidia hooks when launching for example a docker container with NVIDIA_DRIVER_CAPABILITIES=video,compute,utility

PrivatePuffin commented 2 years ago

afaik the container should always start with the driver capabilities set. But even so afaik @stavros-k made sure it was also forced from our side on these cases.

stavros-k commented 2 years ago

afaik the container should always start with the driver capabilities set. But even so afaik @stavros-k made sure it was also forced from our side on these cases.

Actually no, we only force removal of capabilities when no GPU is selected. If I'm not mistaken, iX injects those capabilities when selecting GPU.

@meh301 You can verify the capabilities by opening a bash to jellyfin (3-dots > shell), and doing env (or env | grep NVIDIA for shorter list).

So we can be sure that we are not braking anything here.

PrivatePuffin commented 2 years ago

To be clear: even if they are not there, that is primarily the responsibility of the container creator.

Alex-Orsholits commented 2 years ago

Thank you for your replies, env shows a visible nvidia device but not much else $ env | grep NVIDIA NVIDIA_VISIBLE_DEVICES=GPU-b9c6b00b-95b2-0893-a633-772387351cf6

The issue is most probably due to the container itself sadly

PrivatePuffin commented 2 years ago

We might want to override capabilities=all like k8s-at-home is doing in their containers, for all containers that get an nvidia GPU assigned... @stavros-k ?

stavros-k commented 2 years ago

We might want to override capabilities=all like k8s-at-home is doing in their containers, for all containers that get an nvidia GPU assigned... @stavros-k ?

Yes I'll take a look at it the next days

PrivatePuffin commented 2 years ago

@all-contributors please add @Alex-Orsholits for bugs

allcontributors[bot] commented 2 years ago

@Ornias1993

I've put up a pull request to add @Alex-Orsholits! :tada:

truecharts-admin commented 1 year ago

This issue is locked to prevent necro-posting on closed issues. Please create a new issue or contact staff on discord of the problem persists

truecharts / public