Open routerino opened 3 years ago
Hi! No. Hardware acceleration support is not yet implemented.
Is there any plans to implement hardware acceleration? From my understanding, specifying -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
is enough for ffmpeg options in my case. However, when I specify these options, CCTV Viewer application auto removes -hwaccel_device /dev/dri/renderD128
option, making it unusable.
Hardware acceleration is not currently implemented. There is no point in trying to pass FFmpeg parameters via the command line or any other way. And yes, I have plans to implement hardware acceleration, but it's not a priority right now.
So, while I understand this is a huge undertaking, and this is not something to expect soon. I did some comparing to see the benefit of this. On a RasPi 4 with 4GB, running the same four 1280x720 15fps H264 streams in both omxplayer and cctv-viewer.
omxplayer: 16% CPU total (4% per process), 66 Celcius core temp cctv-player: 260% CPU total, 85 Celcius core temp (temp warning flashing and I suspect it is throttling)
I understand (well, suspect, I do not really know) building full ffmpeg hardware support is not easy, but the benefit is rather significant.
In the interim, would it be possible to get some script API script call, with URL and screen coordinates, and if not found or non-zero return, continue with native ffmpeg processing?
Mind you, for now I am simply using cctv-player with the 640x480 substreams, and it runs fine with 4 of those. The recorder is still recording full resolution, so this is quite acceptable for the monitoring station. I guess only when enlarging it would be nice if it swapped to the main resolution temporary.
Eventually I want to use RapPi 3 however (those mini-HDMI are a pain) but I do not have one around for testing. I am hoping that too can carry 4 sub-streams OK.
PS: command used: omxplayer --win 0,0,959,539 --avdict 'rtsp_transport:tcp' <url>
Hi! I think support for hardware acceleration will appear with the porting of the application to Qt 6.2. But in any case it will be a compromise solution. I expect that different platforms, support different number of simultaneously hardware decoded video streams. In my case each preset contains up to 16 streams... I don't think every platform can provide the ability to decode such a set of data and this variant will need to be handled correctly somehow.
Hi. Would it be possible to implement a basic hardware acceleration? Something that would be by default disabled and then using command line arguments (flags) can be enabled per-camera basis? I am currently running this app on Intel NUC 4x4 matrix and NUC gets pretty hot. Hardware acceleration would be greatly appreciated!
Even experimental hardware acceleration would be super awesome!
Hi. Any plans for hardware acceleration, even the basic one?
+1 for hardware-accelerated decoding
It would be really interesting to "squeeze out" whole HW capabilities to obtain better performances (and, above all, latencies) !
We suggest you to check @rigaya's repos to understand what you could obtain: https://github.com/rigaya
Meanwhile you could try hwaccel auto FFMPEG parameter.
Last but not least, you can also grab some other ideas/approaches about (linux) HW video decoding in this wiki page: https://github.com/opencv/opencv/wiki/Video-IO-hardware-acceleration
Hope that inspires !
Hi all! I hasten to inform you that the built-in player has been radically redesigned and hardware accelerated video decoding is currently available in experimental mode, as well as Zero-copy rendering for X11 desktops.
Hardware acceleration is controlled using the corresponding FFmpeg options: 1) -hwaccel [method] 2) -hwaccel_output [backend]
These options can be set globally in the application settings and for each viewport.
I recommend starting with the single option -hwaccel
. In this case, decoding will be performed by hardware, but rendering will be done through frame copying to system memory.
The list of available decoding methods for your system can be obtained with the command: $ ffmpeg -hwaccels
Currently only the vaapi
method is fully tested. For all the others you will probably need to install some libraries. Something is guaranteed not to work. I suspect that it will be drm
.
Please note that hardware decoding by its nature has hard restrictions. For example, my hardware does not support Baseline profile for h264 decoder. In this case you will not see any messages, decoding will continue on CPU. Indication will be added in the future.
In general, hardware decoding with frame copying to system memory saves RAM. However, in some cases due to copying large amounts of data CPU load may even increase. In my case with a large number of viewports I notice a significant saving of all resources.
However, the full potential of hardware decoding is revealed only with Zero-copy rendering!
Use the -hwaccel_output
option in conjunction with the -hwaccel
option like this: -hwaccel vaapi -hwaccel_output glx
This combination activates hardware accelerated video decoding with VA-API and Zero-copy rendering for X11 desktops.
SPECIAL NOTE:
1) Zero-copy is currently only implemented for X11-based systems (glx
backend)
2) Zero-copy may also require additional packages to be installed. For example, in my case with Intel GPUs I need to install the intel-media-va-driver-non-free
package
3) Actually -hwaccel_output
, is not an FFmpeg option, but a CCTV Viewer option. Do not look for information about it in FFmpeg help. It's done that way for convenience and uniformity. It may be renamed or replaced by another mechanism in the future.
4) If possible, use a package from a PPA rather than SNAP. I have done my best, but due to container isolation HW accel of video decoding or Zero-copy rendering in SNAP may not work properly in some specific cases.
P.S. Due to the deep redesign of the built-in player, various regressions are possible. Please report about them in separate threads.
...where to download the 0.1.9 ppa ? (we don't have/need snap in our Mint 21.2 NUC installation)
https://launchpad.net/~ievgeny/+archive/ubuntu/cctv-viewer
P.S. Ignore the version in the package name. I'll fix it soon.
OK, here's our 1st feedback for 6 x 2Mbps - CBR - RTSP streams (960x540 @ h264, 30fps) from our PTZ cameras displaying.
VAAPI's HW-decoding and ZeroCopy works correctly - lows the CPU usage from 30 to ~10% and increases GPU from 0 to ~15% - only by completely disabling Xfce windows manager's (Xfwm4, in our case) graphic accellerations: it does not work correctly with both software (Compositing: it blanks the second to last stream) or hardware (Compton: it produces display "errors" alternately on each stream) ones enabled.
Hope that helps.
note: do you plan to enable QSV too ?
note: do you plan to enable QSV too ?
All nethods enumerated in $ ffmpeg -hwaccels
should be supported. You may only need to install some driver packages for your platform. Additional work will only be required to implement Zero-copy for each method. "Polishing" the existing functionality is the priority at the moment.
All new features related to hardware acceleration of video decoding will be reported in this thread.
Could you please specify what hardware platform you have?
P.S.
By the way, try -fflags nobuffer -flags low_delay
options to reduce latency. Now all FFmpeg options are correctly passed to the corresponding subsystems.
All nethods enumerated in
$ ffmpeg -hwaccels
should be supported. You may only need to install some driver packages for your platform.We've installed the
intel-media-va-driver-non-free
drivers but, of course, we can add the whole Intel oneAPI/VPL if needed.Additional work will only be required to implement Zero-copy for each method. "Polishing" the existing functionality is the priority at the moment.
We'll soon test each method to report its correct functionality. Just a question: since FFMPEG accepts "auto" for hwaccel option, does it work in cctv too ?
All new features related to hardware acceleration of video decoding will be reported in this thread.
We are tuned on.
Could you please specify what hardware platform you have?
As mentioned in this other issue we do use an Intel NUC NUC5i5MYHE that relays on a i5-5300U that embeds the HD Graphics 5500.
P.S. By the way, try
-fflags nobuffer -flags low_delay
options to reduce latency. Now all FFmpeg options are correctly passed to the corresponding subsystems.OK, we'll test - and report - later.
Just a question: since FFMPEG accepts "auto" for hwaccel option, does it work in cctv too ?
"auto" is not currently supported. But it doesn't do any miracles.
Strictly speaking, FFmpeg options divide into 2 categories: those implemented by FFmpeg libraries (libavformat, libavcodec, etc.) and those implemented by FFmpeg utilities (ffmpeg, ffplay, ffprobe). The former are transferred to libraries as is, the latter must be implemented by CCTV Viewer and their implementation may differ or be missing.
As for QSV, it seems to require specifying a compatible codec https://trac.ffmpeg.org/wiki/Hardware/QuickSync#Decode-only This option and consequently feature is not yet available in CCTV Viewer. Looks like it's time to create a Wiki section....
OK, quick -2nd- feedback: it works (with -fflags nobuffer -flags low_delay
too) but only choosing vaapi.
Every other acceleration - at least on our configuration - have no impact on the CPU load.
Later we'll report about latencies with and without optimization. (note: do you prefer pics or videos ?)
-fflags nobuffer -flags low_delay
- these options are not related to hardware acceleration and can be tested independently.
Text information is enough, but if it is supplemented with graphic materials, it will only be better.
1st set of latency tests
Same hw/sw config (Intel NUC NUC5i5MYHE / Mint 21.2 XFCE / latest cctv) and RTSP streams (6 x 960x540 / h264 / 30fps @ 2Mbps - CBR). Note that latency has been tested on the 1st stream (fullscreened on a Philips 190S8FB/00 monitor) only.
We runned the - russian - Sekundomer's Online Stopwatch on a Xiaomi Redmi Note 9S and grabbed both outs using a Canon EOS 1200D.
Both PTZ and DSLR cameras has been manually configured to 1/250 shutter speed.
RESULTS:
-fflags nobuffer -flags low_delay
ones (~200ms);-hwaccel_output glx
only (means it's skipped ?);-fflags nobuffer -flags low_delay -hwaccel vaapi -hwaccel_output glx
parameters togheter (~1s).Let us know if you need more comprehensive tests (and relative images).
-hwaccel_output glx
without -hwaccel vaapi
makes no sense. It will just be software decoding.
The -fflags nobuffer
option looks irrelevant if look at the FFmpeg source code.
I don't have much hope for latency reduction with hardware decoding, but it makes sense to test the following keysets:
-flags low_delay
-hwaccel vaapi -flags low_delay
-hwaccel vaapi -hwaccel_output glx
-hwaccel vaapi -hwaccel_output glx -flags low_delay
I don't have much hope for latency reduction with hardware decoding, but it makes sense to test the following keysets:
Of course, HW-decoding target is the CPU-work offloading (= less energy consumption/heat) NOT latency.
* `-flags low_delay` * `-hwaccel vaapi -flags low_delay` * `-hwaccel vaapi -hwaccel_output glx` * `-hwaccel vaapi -hwaccel_output glx -flags low_delay`
Ok, later we'll test and report. Just a question: do we need to test latencies for all 6 streams (togheter, of course) or it's the same by default ?
Last but not least, in this stackoverflow reply @teocci suggests some other interesting FFMPEG's parameters to test: How to minimize the delay in a live streaming with ffmpeg
It makes sense to test only one thread so that the multithreaded environment does not distort the result when competing for system resources.
Last but not least, in this stackoverflow reply @teocci suggests some other interesting FFMPEG's parameters to test: How to minimize the delay in a live streaming with ffmpeg
Thanks for info.
Ok, after many tests - we also tried switching to low latency kernel too - we can't obtain any lowest latency than 150ms (no parameters) in our rig. It means that it should be a "phisical" limit.
Thanks for your active support.
Hi,
Just testing the program now. Is there an option to specify quicksync or VAAPI based decoding? Will the program do that automatically?