google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.56k stars 1.28k forks source link

ffmpeg with `h264_nvenc` fails to run on gVisor with `-nvproxy` #9452

Open luiscape opened 11 months ago

luiscape commented 11 months ago

Description

ffmpeg supports video encoding and decoding using NVIDIA GPUs. Here's an example command:

wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4 && \
    ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

Running that command fails on a container started with -nvproxy -nvproxy-docker with the following ffmpeg error:

...
[AVHWDeviceContext @ 0x55d500277300] cu->cuInit(0) failed -> CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
Device creation failed: -1313558101.
[h264 @ 0x55d500251900] No device available for decoder: device type cuda needed for codec h264.
...

Suggesting that calling cuInit(0) fails.

The same command succeeds in runc, encoding video correctly.

We pass NVIDIA_DRIVER_CAPABILITIES=all to expose the video capability.

Steps to reproduce

Build OCI image, example:

docker build -t ffmpeg-test -f Dockerfile .
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install wget ffmpeg -y
RUN wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4

Then run in system with GPU available.

docker run --rm --runtime=runsc --gpus=all ffmpeg-test ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

runsc version

runsc version release-20230920.0-21-ge81e0c72a70b
spec: 1.1.0-rc.1
ayushr2 commented 11 months ago

We don't support graphics/video capabilities yet.

luiscape commented 11 months ago

Sounds good. Thank you for letting me know.

github-actions[bot] commented 7 months ago

A friendly reminder that this issue had no activity for 120 days.

thundergolfer commented 2 weeks ago

@ayushr2 we may take on the work to add the video capability to NVProxy. Many of our customers are running into this limitation when seeking to do GPU-accelerated ffmpeg stuff. Do you have any thoughts or objections before we do?

ayushr2 commented 2 weeks ago

@thundergolfer We are aligning internally around how to proceed with adding non-CUDA support. Let me get back to you once we have fleshed out the details.

thundergolfer commented 2 weeks ago

how to proceed with adding non-CUDA support

It'd be the NVIDIA Video Codec SDK that we'd need to support, right?

Please do keep us in the loop :) We'd slotted in this work for mid-September but will of course adjust if it doesn't fit with your plans.

EtiennePerot commented 2 weeks ago

Please see #10856 which needs to happen before non-CUDA ioctls can be added to nvproxy.

EtiennePerot commented 1 week ago

Hi,

As per #10856, nvproxy cannot currently accept patches for nvenc/nvdec commands until it supports NVIDIA capability segmentation. @ayushr2 and others have started to work on this and we expect this to be done (at least structurally done, i.e. the nvproxy ABI definitions will support being tagged by driver capabilities) by early october.

This is a bit later than your planned date for starting this. So in the meantime, as part of this work, it would also be great if you could contribute some NVENC/NVDEC regression tests as well, even if broken in gVisor at PR merge time. This is necessary not just for correctness, but also to ensure long-term maintainability as the NVIDIA driver and userspace libraries change. ffmpeg's h264_nvenc can take care of exercising nvenc, so that should definitely be one such test. Is there something similarly simple we can use for nvdec?

thundergolfer commented 1 week ago

Thanks for the reply @EtiennePerot. I've made regression testing the first task under our internal project 👍

EtiennePerot commented 1 week ago

We may be able to reuse gVisor's existing ffmpeg image to avoid creating yet another Dockerfile for this. A regression using it can be as simple as this.