iEvgeny / cctv-viewer

CCTV Viewer - viewer and mounter video streams.
GNU General Public License v3.0
135 stars 19 forks source link

[SUGGESTION] AVX/AVX2/AVX-512 optimizations ? #62

Closed MarcoRavich closed 1 year ago

MarcoRavich commented 1 year ago

Hi there, since we don't have any embedded dev, we honestly don't know if this is a technically correct suggestion... ...anyway we found some interesting resources about it:

  1. Improving the compute performance of video processing software using AVX (Advanced Vector Extensions) instructions
  2. [VIMEO] Optimizing for AVX2
  3. SIMD Acceleration for HEVC Decoding
  4. Accelerating x265 with Intel® Advanced Vector Extensions 512
  5. Intel AVX-512 tested in x265: how to enable it and does it help?
  6. AVX Optimizations and Performance: VisualStudio vs GCC

From what we understand, the performances improvement should range between 5 and 10%.

Hope that inspires !

MarcoRavich commented 1 year ago

Bump.

Many a/v de/en-coders are introducing AVX optimizations, here's a couple of examples:

Last but not least, we've discovered this interesting (2019) article by @blegal and @cjego:

Hope that helps.

iEvgeny commented 1 year ago

Low-level optimizations rely entirely on implementation in ffmpeg and are outside the purview of application software developers. I'm currently busy implementing Zero Copy and I have to admit I'm stuck. The poor quality of graphics drivers for Linux and the specificity of the problem require more time. Progress is being made and I've already gotten the first results of reduced CPU and memory load, but I can't say anything about reduced playback latency yet.

MarcoRavich commented 1 year ago

1st of all, thanks for reply.

Low-level optimizations rely entirely on implementation in ffmpeg and are outside the purview of application software developers.

Of course, but I believe that building binaries with AVX* compiler optimizations may help too.

I'm currently busy implementing Zero Copy and I have to admit I'm stuck. The poor quality of graphics drivers for Linux and the specificity of the problem require more time.

Dunno if can help/inspire in any way, but I've recently readed this article about (VirtIO) ZeroCopy: https://www.phoronix.com/news/VirtIO-Vsock-MSG-Zerocopy

Progress is being made and I've already gotten the first results of reduced CPU and memory load, but I can't say anything about reduced playback latency yet.

CCTV latency is more than acceptable for its primary use (monitoring surveillance cameras), but optimizing it could allow it to be used in more time-sensitive applications.

Last but not least, if you feel in a - temporary - deadlock state for ZC, it might be useful - even as a "recreation" - to focus on other features (such as overlays).

Thanks again for your (volunteer) work !

iEvgeny commented 1 year ago

CCTV Viewer does not operate on frame buffers directly. This is done by ffmpeg and partly by Qt. As an experiment, you can rebuild ffmpeg and Qt on your machine locally with the desired optimizations. But I don't think it will have a noticeable effect.