iEvgeny / cctv-viewer

CCTV Viewer - viewer and mounter video streams.
GNU General Public License v3.0
129 stars 19 forks source link

[HELP] suggested distros (or optimizations) for lowest-possible latencies #55

Open MarcoRavich opened 1 year ago

MarcoRavich commented 1 year ago

Hi there, 1st of all thanks for your cool (voluntary) work !

We finally chosed this software as the only one (among the many others tested) able to provide acceptable latencies for live video monitoring that comes from multiple - GBps LAN-connected - PTZ cameras that we use for live performances streamings.

The Intel NUC (i5-5300U-based) we use is able - inside a clean-installed Mint 21.1/xfce distro - to display 4 x 1080p/25fps streams h264 @ 2mbps/VBR encoded with a latency of less than 0.3 seconds (consuming ~40 % of CPU, depending on scene motions captured by cameras). Since we have to display up to 8 streams (in 2 different cctv-viewer instances, in order to drive the separated outputs/monitors) we would like to know if there are any recomended suggestions, customizations (e.g. custom Kernels) or distributions that we can exploit to keep latencies (and, of course, CPU usage & temps) lowest as possible 'cause operators requires "near-zero latency" to drive PTZ cams correctly.

Thanks in advance !

iEvgeny commented 1 year ago

Hi! No extreme latency optimization tests have been done at this time. CCTV Viewer is specially designed in such a way that there is no output buffering. If your device has enough resources, the current frame after demultiplexing and decoding is immediately rendered. To some extent, you can influence the demuxing process with AVFormat options: https://ffmpeg.org/ffmpeg-formats.html#Format-Options For example, the -analyzeduration -probesize options are currently used to reduce start delays in stream playback and may not be optimal for your case. Also at the moment the work on the implementation of hardware decoding acceleration is 60-70% done, but due to the specifics of the application it does not always give a tangible gain in reducing the load on the CPU (However, in some scenarios the result is impressive).

MarcoRavich commented 1 year ago

Hi! No extreme latency optimization tests have been done at this time. CCTV Viewer is specially designed in such a way that there is no output buffering. If your device has enough resources, the current frame after demultiplexing and decoding is immediately rendered.

Hi there, thanks for your reply.

I'm not a dev but, if I've understood correctly, this approach seems similar to @free5ty1e's picamframegrid new one (called "DIRECT TO FRAMEBUFFER METHOD" which displays directly without an intermediate file or a need to compile multiple framegrabs into a single image) even if, unfortunately, doesn't give acceptable latencies on Rpbi platform.

That's why we choosed to use an Intel NUC i5 (5th gen) based PC for low latency monitoring.

To some extent, you can influence the demuxing process with AVFormat options: https://ffmpeg.org/ffmpeg-formats.html#Format-Options For example, the -analyzeduration -probesize options are currently used to reduce start delays in stream playback and may not be optimal for your case.

We'll certainly play both with these FFMPEG params and cameras' hw-encoders ones to optimize decoding flux in order to obtain lowest possible latencies.

Also at the moment the work on the implementation of hardware decoding acceleration is 60-70% done, but due to the specifics of the application it does not always give a tangible gain in reducing the load on the CPU (However, in some scenarios the result is impressive).

Like many others, we're awaiting for the implementation of GPU video decoding that will certainly help to keep the hardware's working temperatures low.

Thanks in advance for what you're doing.

iEvgeny commented 1 year ago

@free5ty1e's picamframegrid

This is a very very strange, but amasing approach to solving this problem. It looks like a classic attempt to implement a WEB server in Bash as an academic task, which, however, has nothing to do with efficiency.

I meant the absence of video buffering, which is in every player and introduces a significant delay up to several tens of seconds. In addition, the new implementation uses the Zero-copy rendering method whenever possible.

MarcoRavich commented 1 year ago

Hi again, during our searches for low latency RTSP "viewers" inside GH, we have found EasyPlayer's repositories - by chinese @tsingsee - that claims to perform hardware decoding on supported platforms (Windows/Android/iOS) with very low delay:

https://github-com.translate.goog/tsingsee/EasyPlayer-RTSP?_x_tr_sl=auto&_x_tr_tl=en

Dunno if this can help or inspire you in any way, but hope so.

note: hoping to do something useful for open software developers, we are colleting/doxing some (re)sources that we'll share under HyMPS \ VIDEO.