raspberrypi / rpicam-apps

BSD 2-Clause "Simplified" License
389 stars 212 forks source link

mjpeg encoding framerate #134

Closed osterwood closed 2 years ago

osterwood commented 2 years ago

I saw in issue https://github.com/raspberrypi/libcamera-apps/issues/23 that @davidplowman mentioned 10 FPS MJPEG encoding at 12 MP -- and I'm wondering how to achieve that (or near that). On my CM4 testing with the HQ camera I can get ~5.8 FPS encoding when cropping to 4056 x 2160. Frame rate is calculated from the average frame time reported by mjpeg_encocer.cpp when the verbose flag is on.

I've tried various settings on denoise and image quality, but none have helped to increase encoding rate. Are there other parameters which I should be using to help increase encoding rate?

Thanks in advance for any ideas.

davidplowman commented 2 years ago

Hi, thanks for the question. Could you maybe post the command line you're using, that would make it easier for us to know exactly what you've tried.

Secondly, those times that come out of mjpeg_encoder.cpp are probably not reliable because it's counting the frames for only one thread (of four, I think) but measures total elapsed time. So you might need to find a different way to measure that. One thing to try is the --framerate option (e.g. --framerate 5.0) and crank the number up until the console starts producing messages about "unmatched frames".

osterwood commented 2 years ago

Thanks for the quick reply. With 6 FPS, I start seeing dropping unmatched input frame messages. I am estimating encoding FPS as FPS = 1000/SUM(average times) * 4 (as there are 4 encoding threads).

Test argument to reproduce:

./libcamera-vid -n --codec mjpeg --segment 0 -o video.mjpeg --width 4056 --height 2160 --framerate 6 --timeout 60000 -v --save-pts times.pts

I've experimented with --denoise off and --denoise cdn_off and --quality set to 20, 80, 100 -- but no real change in encoding rate.

Encode 78 frames, average time 750.81ms
Encode 81 frames, average time 728.522ms
Encode 81 frames, average time 730.149ms
Encode 81 frames, average time 729.356ms

If you graph the saved PTS file you'll see vertical jumps as frames are missed. Blue line here is IDX*1000/6, green line is times from the PTS file.

Screen Shot 2021-11-01 at 11 04 24 AM

davidplowman commented 2 years ago

So I've just tried this on my regular Pi 4 and it sustains 10fps with no frame drops. However, my system may be slightly different to yours. Here are some things to check and some possible differences.

osterwood commented 2 years ago

Interesting. I have recompiled libcamera with:

meson build --buildtype=release -Dpipelines=raspberrypi -Dipas=raspberrypi -Dv4l2=true -Dgstreamer=disabled -Dtest=false -Dlc-compliance=disabled -Dcam=disabled -Dqcam=disabled -Ddocumentation=disabled
ninja -C build
sudo ninja -C build install

and apps with

cd ~/libcamera-apps/build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

But there is no change. Still seeing dropped frames at 6 FPS. CPU freq ramps to 1.5 GHz and stays there the whole test. CPU temperature (via vcgencmd measure_temp) maxes out at 45.7 C.

Removing the output file has the same result. Dropped frames and average encoding time of ~730 ms. Also pulled the most recent libcamera and rebuilt again -- no improvement.

osterwood commented 2 years ago

Strangest thing to me is the vastly different CPU temperatures (you see 70C and I see 45 to 46 C).

davidplowman commented 2 years ago

Yes, it's all getting a bit puzzling. Let me try and find a CM4 tomorrow and I'll see how that goes.

osterwood commented 2 years ago

I flashed a PI4 with recent raspbian and installed libcamera-app via apt -- and I am able to do 10 FPS with zero dropped frames. Load average about 2.1 and CPU temp around 45C (I have a reasonably good passive heatsink on the CPU, like my CM4 setup)

Viewfinder frame 590
Viewfinder frame 591
Viewfinder frame 592
Camera stopped!
Encode 164 frames, average time 245.4ms
Encode 172 frames, average time 248.752ms
Encode 76 frames, average time 249.474ms
Encode 180 frames, average time 246.576ms
MjpegEncoder closed
Closing Libcamera application(frames displayed 591, dropped 0)
Camera stopped!
Tearing down requests, buffers and configuration
Camera closed

I'll setup a new CM4 and see what happens there. Maybe there is some strange library or install issue with the CM4 I've been developing on.

davidplowman commented 2 years ago

That's interesting. Well I suppose it's good that you have something there that behaves as expected, but I'll see if I can try a CM4 in the morning. As you've discovered, installing via apt is convenient - the preparations for Bullseye have involved making proper packages which can now be installed on Buster too.

osterwood commented 2 years ago

The PI4 showed similar results after using libcamera and libcamera-apps installed from git (instead of apt), so I kept digging. I starting looking for system differences between the PI4 and the CM4 and this difference between the two was the "ah ha!" moment:

pi@PI4:~ $ cat /proc/cpuinfo 
processor   : 0
model name  : ARMv7 Processor rev 3 (v7l)
BogoMIPS    : 108.00
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
pi@CM4:~ $ cat /proc/cpuinfo 
processor   : 0
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid

uanme -a showed armv7l and aarch64 respectively.

I changed arm_64bit from 1 to 0 in /boot/config.txt and the CM4 now does frame encoding at ~250ms like the PI4.

I have no idea why the CPU is missing the acceleration features when in 64-bit mode. Is this expected behavior? Lack of NEON and the other features definitely explains the performance difference.

pelwell commented 2 years ago

The cpuinfo output is showing the optional features that are present in 64-bit mode. aarch64 makes many of the optional armv7 CPU features standard - NEON, for example - so the fact that it isn't listed there doesn't mean it isn't available.

The arm64 cpuinfo support contains the following table of known features:

    [KERNEL_HWCAP_FP]       = "fp",
    [KERNEL_HWCAP_ASIMD]        = "asimd",
    [KERNEL_HWCAP_EVTSTRM]      = "evtstrm",
    [KERNEL_HWCAP_AES]      = "aes",
    [KERNEL_HWCAP_PMULL]        = "pmull",
    [KERNEL_HWCAP_SHA1]     = "sha1",
    [KERNEL_HWCAP_SHA2]     = "sha2",
    [KERNEL_HWCAP_CRC32]        = "crc32",
    [KERNEL_HWCAP_ATOMICS]      = "atomics",
    [KERNEL_HWCAP_FPHP]     = "fphp",
    [KERNEL_HWCAP_ASIMDHP]      = "asimdhp",
    [KERNEL_HWCAP_CPUID]        = "cpuid",
    [KERNEL_HWCAP_ASIMDRDM]     = "asimdrdm",
    [KERNEL_HWCAP_JSCVT]        = "jscvt",
    [KERNEL_HWCAP_FCMA]     = "fcma",
    [KERNEL_HWCAP_LRCPC]        = "lrcpc",
    [KERNEL_HWCAP_DCPOP]        = "dcpop",
    [KERNEL_HWCAP_SHA3]     = "sha3",
    [KERNEL_HWCAP_SM3]      = "sm3",
    [KERNEL_HWCAP_SM4]      = "sm4",
    [KERNEL_HWCAP_ASIMDDP]      = "asimddp",
    [KERNEL_HWCAP_SHA512]       = "sha512",
    [KERNEL_HWCAP_SVE]      = "sve",
    [KERNEL_HWCAP_ASIMDFHM]     = "asimdfhm",
    [KERNEL_HWCAP_DIT]      = "dit",
    [KERNEL_HWCAP_USCAT]        = "uscat",
    [KERNEL_HWCAP_ILRCPC]       = "ilrcpc",
    [KERNEL_HWCAP_FLAGM]        = "flagm",
    [KERNEL_HWCAP_SSBS]     = "ssbs",
    [KERNEL_HWCAP_SB]       = "sb",
    [KERNEL_HWCAP_PACA]     = "paca",
    [KERNEL_HWCAP_PACG]     = "pacg",
    [KERNEL_HWCAP_DCPODP]       = "dcpodp",
    [KERNEL_HWCAP_SVE2]     = "sve2",
    [KERNEL_HWCAP_SVEAES]       = "sveaes",
    [KERNEL_HWCAP_SVEPMULL]     = "svepmull",
    [KERNEL_HWCAP_SVEBITPERM]   = "svebitperm",
    [KERNEL_HWCAP_SVESHA3]      = "svesha3",
    [KERNEL_HWCAP_SVESM4]       = "svesm4",
    [KERNEL_HWCAP_FLAGM2]       = "flagm2",
    [KERNEL_HWCAP_FRINT]        = "frint",
    [KERNEL_HWCAP_SVEI8MM]      = "svei8mm",
    [KERNEL_HWCAP_SVEF32MM]     = "svef32mm",
    [KERNEL_HWCAP_SVEF64MM]     = "svef64mm",
    [KERNEL_HWCAP_SVEBF16]      = "svebf16",
    [KERNEL_HWCAP_I8MM]     = "i8mm",
    [KERNEL_HWCAP_BF16]     = "bf16",
    [KERNEL_HWCAP_DGH]      = "dgh",
    [KERNEL_HWCAP_RNG]      = "rng",
    [KERNEL_HWCAP_BTI]      = "bti",
    [KERNEL_HWCAP_MTE]      = "mte",

Notice the absence of neon.

An import aspect of the MJPEG encoding performance is where the encoding takes place - is it in a kernel driver or in a userspace module? I ask because unless you have installed a trial 64-bit OS then your userland is still running in 32-bit mode.

You can query the CPU capabilities while running in 32-bit by running any 32-bit executable (we'll use sleep) with a magic environment variable set:

pi@raspberrypi:~$ LD_SHOW_AUXV=1 sleep 1
AT_HWCAP:        half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0x10034
AT_PHENT:             32
AT_PHNUM:             9
AT_BASE:              0xf7e58000
AT_FLAGS:             0x0
AT_ENTRY:             0x11188
AT_UID:               1000
AT_EUID:              1000
AT_GID:               1000
AT_EGID:              1000
AT_SECURE:            0
AT_RANDOM:            0xffb9f62c
AT_HWCAP2:       crc32
AT_EXECFN:            /usr/bin/sleep
AT_PLATFORM:          v8l

Here you see a list much closer to that reported by cpuinfo on the 32-bit kernel.

So - where is the MJPEG encoding performed?

davidplowman commented 2 years ago

mjpeg encoding is performed by the standard libjpeg library, in userspace.

pelwell commented 2 years ago

It would be helpful to be able to isolate the MJPEG encode performance from the rest of the system - is there a way to encode from a file of raw image data?

davidplowman commented 2 years ago

Hmm, I don't think the code lends itself very directly to benchmarking libjpeg on its own, though libjpeg is very widely used so maybe there are benchmarks out there. I'll have a look around.

Chris, could you perhaps say where the image was coming from that you were using on your CM4? Was it a 64-bit beta image, or did you add the 64-bit flag manually? That would make it easier for me to try and copy the procedure. I tried our latest 64-bit test image but that appears to run fine at 10fps.

osterwood commented 2 years ago

I'm currently using 2021-05-07-raspios-buster-armhf-lite.img, with 5.10.17-v7l+ and 5.10.63-v7l+ kernels. I changed the 64bit flag myself during my initial system setup (when I was adding the IMX477 dtoverlay, changing the GPU memory size, etc).

I take it that I should be using 2021-05-07-raspios-buster-arm64-lite.img instead? Until just now I didn't realize there was a separate OS image for 64 bit.

pelwell commented 2 years ago

It shouldn't be necessary to run the full 64-bit OS - both ought to work, but in general you are still more likely to encounter problems on 64-bit.

davidplowman commented 2 years ago

I've gone through the procedure installing a 32-bit OS and then setting arm_64bit=1 before building all the apps. As we've discovered, it runs poorly. However, if I try this

setarch linux32 ./libcamera-vid ...

then it runs fine again. So it sounds like there might be some confusion within libjpeg and it's incorrectly using the kernel architecture to decide the available optimisations at runtime? I'm not sure what to suggest as the best workaround...

osterwood commented 2 years ago

Thanks all for your help in diagnosing the issue here (and in the education about 32 and 64 CPU features and flags).

It does seem like it is something within libjpeg at runtime. One interesting note, after changing arm_64bit from 1 to 0, the binaries ran 3x faster even without re-building them.

For now, I think I'll just stay with the default OS with a 32 bit kernel as I'm using 2GB and 4GB CM4s. If we change to 8GB in the future, I'll revisit. Unless you know of other advantages in going to the 64 bit beta image.

pelwell commented 2 years ago

FYI the 32-bit kernel uses a feature called Large Physical Address Extension that allows it to address more than 4GB of RAM. Individual processes can't exceed 3GB, but its perfectly possible to use all the available RAM across a whole system, e.g. 3 chromium tabs (joke, ish).