nschlia / ffmpegfs

FUSE-based transcoding filesystem with video support from many formats to FLAC, MP4, TS, WebM, OGG, MP3, HLS, and others.
https://nschlia.github.io/ffmpegfs/
GNU General Public License v3.0
198 stars 14 forks source link

[FEATURE] Hardware Transcoding #63

Closed hpmueller1971 closed 2 years ago

hpmueller1971 commented 3 years ago

Hi,

is it currently somehow possible to use nvenc/nvdec or similar hardware acceleration? I'm trying to play (Youtube-) 4K-VP9 on a Raspberry Pi4 which only supports h265 for 4K, but software-transcoding is "slightly" too slow (less than 2fps ;)), with nvdec-vp9-decoding and nvenc-h265 encoding on a cheap GTS1650 i'm getting more than 300fps :-D.

kind regards, /hp

nschlia commented 3 years ago

If FFmpeg supports the hardware it should be possible. I have done that before for a PI (I created a hardware exelerated version of VLC, see https://www.oblivion-software.de/index.php?id=62). But I never used HW acceleration with FFmpeg myself, AFAIK it requires getting a handle and use some sort of filter. Probably a lot of work and testing...

nschlia commented 3 years ago

ATM I am trying to find out which hardware encoders FFmpeg supports on Raspberry. There is a description how to build a HW encoding enabled version at Red Hen Lab I can roughly follow.

It's currently building, can't wait to see what it can do :)

Basically adding HW encoding to FFmpegfs sounds like a very good idea to me. I use an older board with an i5 CPU and when more than two or three videos play concurrently all CPUs are at 100% permanently. Using HW accelleration could fix that, and the overall video quality could be much higher as it would not require to disable a lot of extensions to get the encoder into real-time. Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...

It'll only take some time because I need to make some decisions first, meaning, there are things like which HW should be supported, how to select and such. But I'll add it to the list.

hpmueller1971 commented 3 years ago

ATM I am trying to find out which hardware encoders FFmpeg supports on Raspberry. There is a description how to build a HW encoding enabled version at Red Hen Lab I can roughly follow.

It's currently building, can't wait to see what it can do :)

Oh dear lord, you're actually building ffmpeg on the raspberry? That has to be fun ;)

If you are using debian, check out deb-multimedia, they provide prebuild ffmpeg pkgs with various hardware-frameworks enabled (for amd64 at least nvidia and quicksync), i think i've seen omx+mmal in the armhf builds, but for some reason, on my testing-rpi1, ffmpeg crashes, so i can't verfy that ATM. The omx-encoder isn't great quality wise, but with mmal hardware-decoding, it would at least be possible to use ffmpegfs on an rpi :)

Using HW accelleration could fix that, and the overall video quality could be much higher as it would not require to disable a lot of extensions to get the encoder into real-time.

Yeah! With my example (4K VP9 -> 4K H265) there is no chance of doing that in software at all; the quality is very dependant on what hw-encoder you're using (they are intended for streaming and not quality, but the turing nvenc is supposed to be almost as good as ffmpeg crf, much better that the volta in my budget-card)...

Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...

At least for nvidia, the answer is here. TL; DR: The consumer products are limited to 3 streams per nvenc-encoder (of which there are 1-3 on one card) , but, this limitation is not inside the hardware, but the driver, cough ;)

It'll only take some time because I need to make some decisions first, meaning, there are things like which HW should be supported, how to select and such.

On the commandline, it's mostly just another codec... as a sidenote, it also makes a huge difference, whether you use hardware-decoding, much more that i expected when i bought the card. Example:

** cuvid vp9 decoder and nvenc encoder ffmpeg -hwaccel cuvid -c:v vp9_cuvid -i "video.webm" -c:v hevc_nvenc -preset fast -rc:v vbr_hq -cq 32 -b:v 0 -acodec eac3 -scodec copy -threads 1 -y "video.mkv"

-> frame=19168 fps=167 q=26.0 Lsize= 596604kB time=00:13:19.55 bitrate=6112.7kbits/s speed=6.98x

** software vp9 decoder and nvenc encoder ffmpeg -i "video.webm" -c:v hevc_nvenc -preset fast -rc:v vbr_hq -cq 32 -b:v 0 -acodec eac3 -scodec copy -threads 1 -y "video.mkv"

-> frame=19168 fps= 79 q=26.0 Lsize= 596604kB time=00:13:19.48 bitrate=6113.2kbits/s speed=3.28x

167fps vs 79fps, almost double with hardware decoding, while cpu load is nearly zero with hardware-decoding vs 100% on one thread with software decoding :-)

But I'll add it to the list.

Very cool, thanks :-)

nschlia commented 3 years ago

Oh dear lord, you're actually building ffmpeg on the raspberry? That has to be fun ;)

As a matter or fact, it's easy. But admittedly I know my way around how to build FFmpeg now. I built it a thousand times on several target machines for my FFmpegfs tests :)

To put it that way, I am too lazy to cross compile and copy the files every time...

If you are using debian, check out deb-multimedia, they provide prebuild ffmpeg pkgs with various hardware-frameworks enabled (for amd64 at least nvidia and quicksync),

That won't help me I guess because I need to figure out how to use HW acceleration using the FFmpeg API. No problem, anyways.

i think i've seen omx+mmal in the armhf builds, but for some reason, on my testing-rpi1, ffmpeg crashes, so i can't verfy that ATM. The omx-encoder isn't great quality wise, but with mmal hardware-decoding, it would at least be possible to use ffmpegfs on an rpi :)

My mmal version has build, I have to try it out yet. You could do it yourself, I'll post a recipe here how to build a minimal FFmpeg version with some HW encoders/decoders.

Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...

At least for nvidia, the answer is here. TL; DR: The consumer products are limited to 3 streams per nvenc-encoder (of which there are 1-3 on one card) , but, this limitation is not inside the hardware, but the driver, cough ;)

Well OK, then 3 HW accelerated streams and a fall back to software would make 5-6 streams on my machine. That'll be OK. And as FFmpeg is open sourced I might be able to remove the 3 stream limitation - if it's not inside the hardware.

But I'll add it to the list.

Very cool, thanks :-)

You're welcome.

nschlia commented 3 years ago

Maybe you want to build FFmpeg with mmal/omx HW acceleration for PI yourself:

Add some repos to your sources.list (maybe not required, I did not check):

vi /etc/apt/sources.list

Add these lines:

deb-src http://mirror.ox.ac.uk/sites/archive.raspbian.org/archive/raspbian/ buster main contrib non-free rpi
deb-src http://archive.raspbian.org/raspbian/ buster main contrib non-free rpi

Do...

# aptitude update # aptitude dist-upgrade

Add the required libs...

apt-get install build-essential yasm git libx264-dev

Pull FFmpeg sources:

git clone --depth=1 git://source.ffmpeg.org/ffmpeg.git

Start build, grab some coffee or a beer...

./configure --enable-gpl --enable-libx264 --enable-nonfree --enable-mmal --enable-omx --enable-omx-rpi  --prefix=/usr --enable-libvpx --extra-ldflags="-latomic"
make -j4

On my system...

#./ffmpeg -decoders | grep mmal
ffmpeg version git-2020-07-17-3a37aa5 Copyright (c) 2000-2020 the FFmpeg developers
 V..... h264_mmal            h264 (mmal) (codec h264)
 V..... mpeg2_mmal           mpeg2 (mmal) (codec mpeg2video)
 V..... mpeg4_mmal           mpeg4 (mmal) (codec mpeg4)
 V..... vc1_mmal             vc1 (mmal) (codec vc1)

# ./ffmpeg -decoders | grep vp9
ffmpeg version git-2020-07-17-3a37aa5 Copyright (c) 2000-2020 the FFmpeg developers
 VFS..D vp9                  Google VP9
 V..... vp9_v4l2m2m          V4L2 mem2mem VP9 decoder wrapper (codec vp9)
 V..... libvpx-vp9           libvpx VP9 (codec vp9)

# ~/install/ffmpeg# ./ffmpeg -encoders | grep mmal

# ./ffmpeg -decoders | grep omx

# ~/install/ffmpeg# ./ffmpeg -encoders | grep omx
 V..... h264_omx             OpenMAX IL H.264 video encoder (codec h264)
 V..... mpeg4_omx            OpenMAX IL MPEG-4 video encoder (codec mpeg4)

Seems that I can decode h264, mpeg2, mpeg4 and vc1 using mmal. vp9 by software only. mmal seems not to support encoding.

omx supports no decoding, but h264 and mpeg4 encoding.

I bought keys for mpeg2/4, h264 and vc1, so probably my PI could support more if I bought keys...

Example result using -c:a h264 (software):

frame= 92 fps=7.5 q=-1.0 Lsize= 1203kB time=00:00:03.56 bitrate=2769.3kbits/s speed=0.29x

CPU Load 4x 100%

Example result using -c:a h264_omx (hardware):

frame= 423 fps= 98 q=-0.0 Lsize= 939kB time=00:00:16.88 bitrate= 455.6kbits/s speed=3.93x

CPU Load 4x 50% (on a busy system, typical load 2x 30-50%)

nschlia commented 3 years ago

Looks good, so I add this to my wish list. Maybe I need to buy a Nividia or so for my server (although it is running headless ). But that could speed up everything a lot.

The test machine is a Raspberry Pi 3 Model B Plus Rev 1.3 which controls the power consumption of a fridge to allow it to run on solar power (everytime the compressor popped in the solar controller's circuit breaker tripped...). Anyways there's an almost real-time process running on it to limit the fridge's current. So the "idle" load is quite high.

hpmueller1971 commented 3 years ago

Yeah, i was talking about the compile time, at least on my RPi1 i'm using for testing, building a full-blown package with x264, x265 etc takes half a day ;)

I see. I refrain from building all as I usually only need H264, WebM, AAC, MP3 for FFmegs, building the above version of FFmpeg on my PI3 only took a few minutes. My requirement is speed so I usually build it with CPU specific optimisations.

That won't help me I guess because I need to figure out how to use HW acceleration using the FFmpeg API. No problem, anyways.

ah, ok, i assumed that you link against the librarys...

I do but I need to figure out how to use the hardware codecs. From the top of my head I remember it requires a hardware handle from the API and some extra coding.

No need, the cough above is a link to the nvidia-patch which removes that limitation :).

Cool. That means a HW support would make sense. Thanks for the info.

Gainward Geforce GTX 1650Super Pegasus OC (4GB) for ~150€)...

Probably I'll get one. I have a personal aversion against Nividia which I call "nie wieder" (sounds a bit like nee-veedia and means "never again" in German. Had some bad experiences with Nivida cards).

I also had some bad experiences with HW acceleration - I bought a firewall router with HW support for AES encryption only to find out that the VPN throughput was smaller when HW acceleration was on. The software encoder provided higher throughput with a bit more CPU consumption. Ended up using Camelea software encryption which was even faster and using less CPU power. You never now how things turn out in the end. :)

nschlia commented 3 years ago

With commit 10f7cac experimental hardware encoder support has been added. Please check HWACCEL what's implemented and what's still missing.

nschlia commented 3 years ago

With commit a3b05622e289c58abe6deaa6829a37559942b830 and a few more hardware encoding can be controlled by command line.

Currently only VAAPI (Debian) and OMX (Raspbian) have been tested.

nschlia commented 3 years ago

Currently, VAAPI, MMAL and OpenMAX have been tested. Will also check CUDA and Video Mem2Mem.

Code API Name Support
HWACCELAPI_VAAPI Intel: VAAPI OK
HWACCELAPI_MMAL Raspberry: MMAL OK
HWACCELAPI_OMX Raspberry: OpenMAX OK
HWACCELAPI_CUDA Nividia: CUDA to be tested
HWACCELAPI_V4L2M2M v4l2 mem to mem (Video4linux) to be tested
HWACCELAPI_VDPAU VDPAU not supported
HWACCELAPI_QSV QSV not supported
HWACCELAPI_OPENCL OPENCL not supported
HWACCELAPI_VULKAN VULKAN not supported
HWACCELAPI_VIDEOTOOLBOX Video Toolbox not supported (MacOS only) https://developer.apple.com/documentation/videotoolbox
HWACCELAPI_MEDIACODEC MediaCodec API not supported (Android only) https://android-doc.github.io/reference/android/media/MediaCodec.html
HWACCELAPI_DRM DRM not supported (Windows only)
HWACCELAPI_DXVA2 DXVA2 not supported (Windows only)
HWACCELAPI_D3D11VA D3D11VA not supported (Windows only)
nschlia commented 2 years ago

And finally, the feature has been implemented, tested and rolled out. Will be available in Debian & co with the next update release as V2.6.

Cheers to @hpmueller1971 for this incredible idea. This is a big performance boost for hardware supported formats!