Open ricardopretrazy opened 2 months ago
I don't really understand why you're getting CMA allocation failures, because a Pi 5 shouldn't be using CMA at all. Can you paste the output of ls -l /dev/dma_heap
?
Is there anything else much running while you are using this camera script?
hi @davidplowman thanks for your quick response. Here is the result from dma_heap
I'm not specifically running anything else, it's a very basic installation and this is the only thing I'm running.
Well, that's wrong. On a Pi 5, vidbuf_cached
should be a link to system
, not linux,cma
.
Do you have any idea how that's happened? Have you installed something that might have done this? If you have a spare SD card lying around, maybe try installing a clean Raspberry Pi OS and check that it links to system
.
hmm. that's interesting, I've got 6 PIs here, and just looked on a few others, and they have vidbuf_cached -> system
as you suggest. I don't really know what this means. As this problem is intermittent, I do wonder if I reboot, it will revert back to system
again.
Is there a way to modify this to 'system' to see if it fixes this error (while I still have the PI booted up & erroring)
Well, I think it's only a symbolic link. Presumably you could go in as root and recreate it. But just be sure that there's nothing there you can't recover, just in case it never boots again.
So I think that file system does get created when you boot, but I've never heard of it pointing to the wrong thing. It would be interesting if rebooting fixes the link, or not.
I've been reluctant to reboot, as I've been trying to replicate this error (which happened a few times in production) this is the first time I've seen it locally on any of the PIs I've got here.
I have rebooted, and it has gone back to 'system', and the script runs, I've tried running it about 10 times and it seems ok, but as I mentioned before it's intermittent, and will very likely happen again.
I suppose the interesting thing will to be to see if it switches spontaneously to linux,cma
when it fails, or just before it fails. I had absolutely no idea that such a thing could happen, but I guess we'll find out...
I've been trying to replicate this all day, and almost gave up, but then rebooted the PIs and ran the script, it failed on two of the PIs and when checking the vidbuf_cache
it had, again, been set to linux, cma. It's interesting that it has been fine all day, and only after rebooting, did it occur again.
Can you think of anything that might be causing this? I've googled vidbuf_cache
, but it only returns 2 results (one of them is this thread)
I've added more logging into my software to display this vidbuf_cache
value before it starts/captures, to see if it's already set.
That's so weird. I'll ask around here at Pi Towers.
It happened again this morning, after rebooting 3 times (I have 6 PIs here). And it has been a different PI each time it has occurred, so I don't think it's anything dodgy with a particular PI.
Are there any specific logs you'd be interested to see? or any other device information?
And, this is before I've done anything with PICamera2, I've just been SSHing onto the machines to get the vidbuf_cache value.
Yes, a dmesg log would be great. We'll see if anyone else suggests anything more.
Here is the dmesg
log, let me know if there are any other logs. This doesn't have the ..,
cma: cma_alloc: linux,cma: alloc failed, req-size: 991 pages, ret: -12
(<-- that's from a previous log where I tried to capture)
...error, as this is just me observing that the symbolic link is pointing to linux,cma
rather than system
Actually, maybe post a "good" version of dmesg as well, we can look for any differences.
I've rebooted the same machine, and it was back to system
(as I expected), here is the dmesg
content for this good boot.
I did try this yesterday, but it's quite difficult to compare, because the log messages are in a slightly different order.
Could you also attach the contents of /proc/cpuinfo
in both the good and bad cases, when you manage to catch it again? Thanks.
This is the good one, I'll have to wait for it to happen again.. I'll try some re-boots
cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
Revision : c04170
Serial : 5c02c1cf7adb8836
Model : Raspberry Pi 5 Model B Rev 1.0
I've rebooted three times, and a different PI was set to linux,cma
This is the cpuinfo for that PI (i.e. when it's in the 'bad' state)
I can grant you (or someone in your team) access to connect to the PI if you'd like to investigate further?
rich@pi-117:~ $ cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 1
Revision : c04170
Serial : 77426980eba7ae7a
Model : Raspberry Pi 5 Model B Rev 1.0
rich@pi-117:~ $ ls -l /dev/dma_heap/
total 0
crw-rw---- 1 root video 253, 1 Aug 30 11:12 linux,cma
crw-rw---- 1 root video 253, 0 Aug 30 11:12 system
lrwxrwxrwx 1 root root 9 Aug 30 11:12 vidbuf_cached -> linux,cma
The destination of the symlink is supposed to be determined at boot by looking at the Revision:
field in /proc/cpuinfo
. And it looks fine in both cases, but tbh, it's hard to see how that could go wrong.
I wonder if there might be some timing issues, and it depends on exactly when system performs that action. I notice in your dmesg logs that the camera comes up earlier in the "bad" log. I wonder if there's any kind of pattern there. Out of interest, have you ever noticed this with a Raspberry Pi camera? Though with smaller images you might not notice without checking explicitly - because the CMA area will be big enough.
OK, here are a couple more ideas. First of all can you add udev.log-priority=debug
to the end of your /boot/firmware/cmdline.txt
. That should give us some more debug. To see this debug, use sudo journalctl -b0
. I don't think we're too fussed about the "good" case, but of course it's the "bad" case that is harder to provoke.
I wonder if this happens quite frequently on PIs, but rarely gets noticed. I was originally just capturing in either 1/2 or full resolutions, it's only since I've been starting in 1/2, then capturing in full that this issue has become apparent. I'm guessing it takes a bit more memory to switch modes rather than just starting with full and capturing in the same mode.
I have a few HQ cameras, but I doubt they'd error, as they take up a lot less memory, so I'm guessing that even if it was set to linux,cma
it would still work fine.
The memory errors I'm getting are a symptom of the vidbuf_cached issue, rather than the cause (I believe)
The tests I've been doing today haven't touched the cameras, although they're all still connected.
I'm still not seeing anything in the logs jump out and me, I'm afraid, and I'm also not here next week.
But it would be interesting to know if you happen to notice any more patterns, or whether you see it with other cameras (now that you know how to check right after boot if it's gone wrong). In the meantime, I'll ask folks here to keep a lookout for this, and we can perhaps establish how widespread it is.
I can reproduce the issue and have managed to get some useful output. I think what's happening is that the udev rule runs before the 'system' node shows up, so creating the symlink fails and it falls through to the other case. Not 100% sure, still investigating.
Edit: Nope, that doesn't make sense...
The right thing happens and then the symlink gets replaced.
Aug 30 16:31:08 dmatest (udev-worker)[686]: system: Successfully created symlink '/dev/dma_heap/vidbuf_cached' to '/dev/dma_heap/system'
...
Aug 30 16:31:08 dmatest (udev-worker)[682]: linux,cma: Successfully created symlink '/dev/dma_heap/vidbuf_cached' to '/dev/dma_heap/linux,cma'
So I think it's a race condition between whether system
or linux,cma
shows up first. I think the way the rule is written means that if linux,cma
rule matches, it executed the symlink step no matter what, wheras the system symlink only happens if the revision matches the given pattern.
@davidplowman I'm guessing that it is quite widespread, but unless you're running something that will fail due to this symlink, you won't ever know. I've got 6 PIs here, and I'd say that it happens to 1 PI about once every 3 power cycles, so maybe a 1/20 ish chance, which seem in line with what we were seeing in production where we had 40 PIs and 1 or 2 of them would fail.
@XECDesign I'm happy that you were able to reproduce, so at least it's not just me... and I've just seen your latest comment whilst typing.
Do you think there might be a config change I could make to prevent this?
I don't know what the right fix is right now, but to temporarily work around the issue you can edit /lib/udev/rules.d/60-dma-heap.rules
and comment out this line:
SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached"
Then when I know the right fix, I can push that out and your change will be overwritten.
Testing with , OPTIONS+="link_priority=-50"
added to the end of that rule. If I am understanding it right, then that should mean that when it notices there's already a symlink there (with a default priority of 0), it will skip creating another one.
My pi is set up to keep rebooting until it the issue occurs. It has been going for a while, so maybe that's the fix.
I think the rule is meant to look like this:
SUBSYSTEM=="dma_heap", GROUP="video", MODE="0660"
SUBSYSTEM=="dma_heap", KERNEL=="system", PROGRAM="/bin/grep -q \"^Revision\s*:\s*[ 123][0-9a-fA-F][0-9a-fA-F]4[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]$$\" /proc/cpuinfo", SYMLINK+="dma_heap/vidbuf_cached"
SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached", OPTIONS+="link_priority=-50"
My .rules
file is slightly different to yours. I've got a GOTO="dma_end"
at the end of the 2nd line.. and then the label at the end.
$ cat 60-dma-heap.rules
SUBSYSTEM=="dma_heap", GROUP="video", MODE="0660"
SUBSYSTEM=="dma_heap", KERNEL=="system", PROGRAM="/bin/grep -q \"^Revision\s*:\s*[ 123][0-9a-fA-F][0-9a-fA-F]4[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]$$\" /proc/cpuinfo", SYMLINK+="dma_heap/vidbuf_cached", GOTO="dma_end"
SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached"
LABEL="dma_end"
Or did you just not include this?
I removed everything else because I don't think it's needed. What I pasted above should be the whole file.
It's looking promising, I've replaced my rules file with what you've posted above on all 6 PIs, and I've done about 7 power cycles without failure (i.e. it's always set to system
)
Thank you so much for this (both of you)
As we have 60+ PIs in various places, should I wait for an official release, or do these things take a long time to go through QA/Release etc?
If it will take some time, I can write a script which replaces this file via an SSH script, and update them all remotely.
I would expect the fix to ship some time next week
Just tested that it works as intended on older pi models as well and shipped the update.
Many thanks for spotting and helping track down the issue. It would've caused a lot of other problems which could've been written off as "couldn't reproduce it, you must be doing something wrong".
Thank you @XECDesign and @davidplowman for taking the time to look into this and resolving it so quickly. Is this now just a matter of doing an apt upgrade
to resolve this issue?
sudo apt update
sudo apt full-upgrade
Hello,
using an imx519 on RPI4 I see similar issue. It was working in the past. I usually use a lores preview stream and switch config for high quality picture on request. It works like 2-4 times to switch, then the system runs out of cma memory:
2024-09-14 09:10:56,247 [ INFO] switch_mode invoked, stopping stream encoder, switch mode and restart encoder (picamera2.py:338)
2024-09-14 09:10:56,741 [ INFO] Camera stopped (picamera2.py:1216)
[0:34:34.782866469] [3925] INFO Camera camera.cpp:1191 configuring streams: (0) 4656x3496-BGR888 (1) 2328x1748-YUV420 (2) 4656x3496-SRGGB10_CSI2P
[0:34:34.794527187] [3920] INFO RPI vc4.cpp:630 Sensor: /base/soc/i2c0mux/i2c@1/imx519@1a - Selected sensor format: 4656x3496-SRGGB10_1X10 - Selected unicam format: 4656x3496-pRAA
2024-09-14 09:10:56,756 [ INFO] Configuration successful! (picamera2.py:1073)
2024-09-14 09:10:56,836 [ DEBUG] Allocated 3 buffers for stream 0 with fds [40, 45, 50] (dmaallocator.py:64)
2024-09-14 09:10:56,848 [ DEBUG] Allocated 3 buffers for stream 1 with fds [57, 66, 71] (dmaallocator.py:64)
2024-09-14 09:10:56,880 [ DEBUG] Allocated 3 buffers for stream 2 with fds [74, 85, 88] (dmaallocator.py:64)
[0:34:34.992476933] [3920] ERROR V4L2 v4l2_videodevice.cpp:1267 /dev/video15[27:cap]: Unable to request 1 buffers: Nicht genügend Hauptspeicher verfügbar
[0:34:34.992570598] [3920] ERROR RPI pipeline_base.cpp:699 Failed to allocate buffers
2024-09-14 09:10:56,961 [ ERROR] Failed to start camera: Nicht genügend Hauptspeicher verfügbar (picamera2.py:348)
Traceback (most recent call last):
File "/home/pi/.local/pipx/venvs/photobooth-app/lib/python3.11/site-packages/photobooth/services/backends/picamera2.py", line 346, in _switch_mode
self._picamera2.switch_mode(self._current_config)
File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1428, in switch_mode
return self.dispatch_functions(functions, wait, signal_function, immediate=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1354, in dispatch_functions
return job.get_result(timeout=timeout) if wait else job
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/picamera2/job.py", line 79, in get_result
return self._future.result(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3/dist-packages/picamera2/job.py", line 48, in execute
done, result = self._functions[0]()
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1422, in switch_mode_
self.start_()
File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1145, in start_
self.camera.start(controls)
RuntimeError: Failed to start camera: Nicht genügend Hauptspeicher verfügbar
2024-09-14 09:10:56,970 [CRITICAL] error switching mode in picamera due to Failed to start camera: Nicht genügend Hauptspeicher verfügbar (picamera2.py:349)
pi@photobooth-glitz:~/photobooth-data $ ls -l /dev/dma_heap/
insgesamt 0
crw-rw---- 1 root video 253, 1 13. Sep 22:04 linux,cma
crw-rw---- 1 root video 253, 0 13. Sep 22:04 system
lrwxrwxrwx 1 root root 9 13. Sep 22:04 vidbuf_cached -> linux,cma
I did full-upgrade, but did not change anything. On Pi4 it should use the cma I think - but it seems to not free the cma after switching.
app started:
pi@photobooth-glitz:~ $ cat /proc/meminfo | grep Cma
CmaTotal: 524288 kB
CmaFree: 232864 kB
switching back and forth between modes here. CmaFree is between 300MB and 150MB until it drops further and now it runs out of memory
pi@photobooth-glitz:~ $ cat /proc/meminfo | grep Cma
CmaTotal: 524288 kB
CmaFree: 83776 kB
camera backend restarted to recover:
pi@photobooth-glitz: $ cat /proc/meminfo | grep Cma
CmaTotal: 524288 kB
CmaFree: 407452 kB
@mgineer85 Hi, and thanks for the report. The problem under discussion in this thread only occurs on Pi 5, so yours must be quite different. Please file a new report for a new problem. Be sure to state: your model of Pi, the camera, and confirm that everything is up to date. If possible, please also include a short script (no more than a dozen lines, if possible) that demonstrates the failure. Thanks!
I start to give up on using Arducams because of their poor integration. It seems after the latest kernel updates their driver is broken, autofocus does not work any more. The error posted above still persists, I will check if I can reproduce with a simple script and native hardware.
There have been a few kernel related changes recently because of the Hailo device and the new AI camera, but hopefully things are going to settle down a bit now, so once Arducam have caught up again maybe things will stay working for a bit longer. (As I'm sure you know, we encourage vendors to upstream support for their cameras, and then this kind of thing would happen less.)
I have a python script which starts a camera in 1/2 resolution, then when I'm ready, I switch to full resolution and capture. This works for the majority of the time, but occasionally I'll get a memory error.
When this error occurs, it will keep occurring until I reboot (restarting the script is insufficient), however I can happily run my test script multiple times without issue (on the same PI) so something is getting it into a state.
I am unable to replicate consistently, I have to wait for it to error, then do some tests.
I've simplified my code, and condensed it down to a simple
python
file.This results in the following output
If I look at
dmesg
I can see some cma_aloc errors, and at one point adma
error.I'm running with the following hardware Raspberry Pi 5 (4gb memory) OS - Debian GNU/Linux 12 (bookworm) Uname - 6.6.31+rpt-rpi-2712 Camera - Arducam 64mp (Hawkeye)
Free
resultslibcamera-still --version
results