Closed hpmueller1971 closed 2 years ago
If FFmpeg supports the hardware it should be possible. I have done that before for a PI (I created a hardware exelerated version of VLC, see https://www.oblivion-software.de/index.php?id=62). But I never used HW acceleration with FFmpeg myself, AFAIK it requires getting a handle and use some sort of filter. Probably a lot of work and testing...
ATM I am trying to find out which hardware encoders FFmpeg supports on Raspberry. There is a description how to build a HW encoding enabled version at Red Hen Lab I can roughly follow.
It's currently building, can't wait to see what it can do :)
Basically adding HW encoding to FFmpegfs sounds like a very good idea to me. I use an older board with an i5 CPU and when more than two or three videos play concurrently all CPUs are at 100% permanently. Using HW accelleration could fix that, and the overall video quality could be much higher as it would not require to disable a lot of extensions to get the encoder into real-time. Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...
It'll only take some time because I need to make some decisions first, meaning, there are things like which HW should be supported, how to select and such. But I'll add it to the list.
ATM I am trying to find out which hardware encoders FFmpeg supports on Raspberry. There is a description how to build a HW encoding enabled version at Red Hen Lab I can roughly follow.
It's currently building, can't wait to see what it can do :)
Oh dear lord, you're actually building ffmpeg on the raspberry? That has to be fun ;)
If you are using debian, check out deb-multimedia, they provide prebuild ffmpeg pkgs with various hardware-frameworks enabled (for amd64 at least nvidia and quicksync), i think i've seen omx+mmal in the armhf builds, but for some reason, on my testing-rpi1, ffmpeg crashes, so i can't verfy that ATM. The omx-encoder isn't great quality wise, but with mmal hardware-decoding, it would at least be possible to use ffmpegfs on an rpi :)
Using HW accelleration could fix that, and the overall video quality could be much higher as it would not require to disable a lot of extensions to get the encoder into real-time.
Yeah! With my example (4K VP9 -> 4K H265) there is no chance of doing that in software at all; the quality is very dependant on what hw-encoder you're using (they are intended for streaming and not quality, but the turing nvenc is supposed to be almost as good as ffmpeg crf, much better that the volta in my budget-card)...
Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...
At least for nvidia, the answer is here. TL; DR: The consumer products are limited to 3 streams per nvenc-encoder (of which there are 1-3 on one card) , but, this limitation is not inside the hardware, but the driver, cough ;)
It'll only take some time because I need to make some decisions first, meaning, there are things like which HW should be supported, how to select and such.
On the commandline, it's mostly just another codec... as a sidenote, it also makes a huge difference, whether you use hardware-decoding, much more that i expected when i bought the card. Example:
** cuvid vp9 decoder and nvenc encoder ffmpeg -hwaccel cuvid -c:v vp9_cuvid -i "video.webm" -c:v hevc_nvenc -preset fast -rc:v vbr_hq -cq 32 -b:v 0 -acodec eac3 -scodec copy -threads 1 -y "video.mkv"
-> frame=19168 fps=167 q=26.0 Lsize= 596604kB time=00:13:19.55 bitrate=6112.7kbits/s speed=6.98x
** software vp9 decoder and nvenc encoder ffmpeg -i "video.webm" -c:v hevc_nvenc -preset fast -rc:v vbr_hq -cq 32 -b:v 0 -acodec eac3 -scodec copy -threads 1 -y "video.mkv"
-> frame=19168 fps= 79 q=26.0 Lsize= 596604kB time=00:13:19.48 bitrate=6113.2kbits/s speed=3.28x
167fps vs 79fps, almost double with hardware decoding, while cpu load is nearly zero with hardware-decoding vs 100% on one thread with software decoding :-)
But I'll add it to the list.
Very cool, thanks :-)
Oh dear lord, you're actually building ffmpeg on the raspberry? That has to be fun ;)
As a matter or fact, it's easy. But admittedly I know my way around how to build FFmpeg now. I built it a thousand times on several target machines for my FFmpegfs tests :)
To put it that way, I am too lazy to cross compile and copy the files every time...
If you are using debian, check out deb-multimedia, they provide prebuild ffmpeg pkgs with various hardware-frameworks enabled (for amd64 at least nvidia and quicksync),
That won't help me I guess because I need to figure out how to use HW acceleration using the FFmpeg API. No problem, anyways.
i think i've seen omx+mmal in the armhf builds, but for some reason, on my testing-rpi1, ffmpeg crashes, so i can't verfy that ATM. The omx-encoder isn't great quality wise, but with mmal hardware-decoding, it would at least be possible to use ffmpegfs on an rpi :)
My mmal version has build, I have to try it out yet. You could do it yourself, I'll post a recipe here how to build a minimal FFmpeg version with some HW encoders/decoders.
Alas, one question is, what happens if more than one thread tries to use the hardware encoder. Will that work? Scratching my head...
At least for nvidia, the answer is here. TL; DR: The consumer products are limited to 3 streams per nvenc-encoder (of which there are 1-3 on one card) , but, this limitation is not inside the hardware, but the driver, cough ;)
Well OK, then 3 HW accelerated streams and a fall back to software would make 5-6 streams on my machine. That'll be OK. And as FFmpeg is open sourced I might be able to remove the 3 stream limitation - if it's not inside the hardware.
But I'll add it to the list.
Very cool, thanks :-)
You're welcome.
Maybe you want to build FFmpeg with mmal/omx HW acceleration for PI yourself:
Add some repos to your sources.list (maybe not required, I did not check):
vi /etc/apt/sources.list
Add these lines:
deb-src http://mirror.ox.ac.uk/sites/archive.raspbian.org/archive/raspbian/ buster main contrib non-free rpi
deb-src http://archive.raspbian.org/raspbian/ buster main contrib non-free rpi
Do...
# aptitude update
# aptitude dist-upgrade
Add the required libs...
apt-get install build-essential yasm git libx264-dev
Pull FFmpeg sources:
git clone --depth=1 git://source.ffmpeg.org/ffmpeg.git
Start build, grab some coffee or a beer...
./configure --enable-gpl --enable-libx264 --enable-nonfree --enable-mmal --enable-omx --enable-omx-rpi --prefix=/usr --enable-libvpx --extra-ldflags="-latomic"
make -j4
On my system...
#./ffmpeg -decoders | grep mmal
ffmpeg version git-2020-07-17-3a37aa5 Copyright (c) 2000-2020 the FFmpeg developers
V..... h264_mmal h264 (mmal) (codec h264)
V..... mpeg2_mmal mpeg2 (mmal) (codec mpeg2video)
V..... mpeg4_mmal mpeg4 (mmal) (codec mpeg4)
V..... vc1_mmal vc1 (mmal) (codec vc1)
# ./ffmpeg -decoders | grep vp9
ffmpeg version git-2020-07-17-3a37aa5 Copyright (c) 2000-2020 the FFmpeg developers
VFS..D vp9 Google VP9
V..... vp9_v4l2m2m V4L2 mem2mem VP9 decoder wrapper (codec vp9)
V..... libvpx-vp9 libvpx VP9 (codec vp9)
# ~/install/ffmpeg# ./ffmpeg -encoders | grep mmal
# ./ffmpeg -decoders | grep omx
# ~/install/ffmpeg# ./ffmpeg -encoders | grep omx
V..... h264_omx OpenMAX IL H.264 video encoder (codec h264)
V..... mpeg4_omx OpenMAX IL MPEG-4 video encoder (codec mpeg4)
Seems that I can decode h264, mpeg2, mpeg4 and vc1 using mmal. vp9 by software only. mmal seems not to support encoding.
omx supports no decoding, but h264 and mpeg4 encoding.
I bought keys for mpeg2/4, h264 and vc1, so probably my PI could support more if I bought keys...
Example result using -c:a h264 (software):
frame= 92 fps=7.5 q=-1.0 Lsize= 1203kB time=00:00:03.56 bitrate=2769.3kbits/s speed=0.29x
CPU Load 4x 100%
Example result using -c:a h264_omx (hardware):
frame= 423 fps= 98 q=-0.0 Lsize= 939kB time=00:00:16.88 bitrate= 455.6kbits/s speed=3.93x
CPU Load 4x 50% (on a busy system, typical load 2x 30-50%)
Looks good, so I add this to my wish list. Maybe I need to buy a Nividia or so for my server (although it is running headless
The test machine is a Raspberry Pi 3 Model B Plus Rev 1.3 which controls the power consumption of a fridge to allow it to run on solar power (everytime the compressor popped in the solar controller's circuit breaker tripped...). Anyways there's an almost real-time process running on it to limit the fridge's current. So the "idle" load is quite high.
Yeah, i was talking about the compile time, at least on my RPi1 i'm using for testing, building a full-blown package with x264, x265 etc takes half a day ;)
I see. I refrain from building all as I usually only need H264, WebM, AAC, MP3 for FFmegs, building the above version of FFmpeg on my PI3 only took a few minutes. My requirement is speed so I usually build it with CPU specific optimisations.
That won't help me I guess because I need to figure out how to use HW acceleration using the FFmpeg API. No problem, anyways.
ah, ok, i assumed that you link against the librarys...
I do but I need to figure out how to use the hardware codecs. From the top of my head I remember it requires a hardware handle from the API and some extra coding.
No need, the cough above is a link to the nvidia-patch which removes that limitation :).
Cool. That means a HW support would make sense. Thanks for the info.
Gainward Geforce GTX 1650Super Pegasus OC (4GB) for ~150€)...
Probably I'll get one. I have a personal aversion against Nividia which I call "nie wieder" (sounds a bit like nee-veedia and means "never again" in German. Had some bad experiences with Nivida cards).
I also had some bad experiences with HW acceleration - I bought a firewall router with HW support for AES encryption only to find out that the VPN throughput was smaller when HW acceleration was on. The software encoder provided higher throughput with a bit more CPU consumption. Ended up using Camelea software encryption which was even faster and using less CPU power. You never now how things turn out in the end. :)
With commit 10f7cac experimental hardware encoder support has been added. Please check HWACCEL what's implemented and what's still missing.
With commit a3b05622e289c58abe6deaa6829a37559942b830 and a few more hardware encoding can be controlled by command line.
Currently only VAAPI (Debian) and OMX (Raspbian) have been tested.
Currently, VAAPI, MMAL and OpenMAX have been tested. Will also check CUDA and Video Mem2Mem.
Code | API Name | Support | |
---|---|---|---|
HWACCELAPI_VAAPI | Intel: VAAPI | OK | |
HWACCELAPI_MMAL | Raspberry: MMAL | OK | |
HWACCELAPI_OMX | Raspberry: OpenMAX | OK | |
HWACCELAPI_CUDA | Nividia: CUDA | to be tested | |
HWACCELAPI_V4L2M2M | v4l2 mem to mem (Video4linux) | to be tested | |
HWACCELAPI_VDPAU | VDPAU | not supported | |
HWACCELAPI_QSV | QSV | not supported | |
HWACCELAPI_OPENCL | OPENCL | not supported | |
HWACCELAPI_VULKAN | VULKAN | not supported | |
HWACCELAPI_VIDEOTOOLBOX | Video Toolbox | not supported (MacOS only) | https://developer.apple.com/documentation/videotoolbox |
HWACCELAPI_MEDIACODEC | MediaCodec API | not supported (Android only) | https://android-doc.github.io/reference/android/media/MediaCodec.html |
HWACCELAPI_DRM | DRM | not supported (Windows only) | |
HWACCELAPI_DXVA2 | DXVA2 | not supported (Windows only) | |
HWACCELAPI_D3D11VA | D3D11VA | not supported (Windows only) |
And finally, the feature has been implemented, tested and rolled out. Will be available in Debian & co with the next update release as V2.6.
Cheers to @hpmueller1971 for this incredible idea. This is a big performance boost for hardware supported formats!
Hi,
is it currently somehow possible to use nvenc/nvdec or similar hardware acceleration? I'm trying to play (Youtube-) 4K-VP9 on a Raspberry Pi4 which only supports h265 for 4K, but software-transcoding is "slightly" too slow (less than 2fps ;)), with nvdec-vp9-decoding and nvenc-h265 encoding on a cheap GTS1650 i'm getting more than 300fps :-D.
kind regards, /hp