Pi5 with two cameras, maximum resolution causes error when 'starting' both

ricardopretrazy commented 1 month ago

I've written some software to take two photos at the same time (within a fraction of a second hopefully). To do this, I create two instances of Picamera2, then call .start on each of them.

When I choose a lower resolution, the code works fine, and I'm able to grab two images, but if I try to use the maximum resolution of both cameras, I get an error.

from picamera2 import Picamera2

print("instantiating 1")
pi_zero = Picamera2(0)
print(pi_zero.sensor_modes)
config_zero = pi_zero.create_still_configuration()
pi_zero.configure(config_zero)

print("instantiating 2")
pi_one = Picamera2(1)
print(pi_one.sensor_modes)
config_one = pi_one.create_still_configuration()
pi_one.configure(config_one)

print('starting 1');
pi_zero.start()
print('starting 2');
pi_one.start()

print('capturing 1')
pi_zero.capture_file('first.jpg')
print('capturing 2')
pi_one.capture_file('second.jpg') # It shows an error and hangs on this line

print('stopping 1')
pi_zero.stop()
print('stopping 2')
pi_one.stop()

print('closing 1')
pi_zero.close()
print('closing 1')
pi_one.close()

I would expect the above code to produce two images at the full sensor resolution.

The relevant error/info logs are;

instantiating 1
instantiating 2
starting 1
starting 2
capturing 1
capturing 2
[1:01:49.221951758] [4776] ERROR V4L2 v4l2_videodevice.cpp:1697 /dev/video35[52:cap]: Failed to queue buffer 1: Invalid argument
[1:01:49.221983647] [4776] ERROR RPISTREAM rpi_stream.cpp:276 Failed to queue buffer for ISP TDN Output

Hardware; Raspberry PI 5 (4gb) Arducam 64mp Hawkeye (arducam_64mp) - 9152x6944 High Quality Camera Module (imx477) - 4056x3040

Software; Debian GNU/Linux 12 (bookworm) rpicam-apps build: 966fa99736d8-intree-dirty 24-04-2024 (09:27:41) libcamera build: v0.1.0+320-5b2af7e6-dirty (2024-04-24T09:02:46+01:00) python3-picamera2/stable,stable,now 0.3.18-1

davidplowman commented 1 month ago

Hi, could you say what you mean by "choose a lower resolution"? Does the 64MP camera have a 2x2 binned mode that makes this work? It might be worth looking if there is anything in dmesg.

I'm afraid I don't have access to one of these 64MP cameras, so let's perhaps try and find the most minimal change that makes it work, and see if we can figure out what the difference then is.

ricardopretrazy commented 1 month ago

Hi @davidplowman

If I modify the code to have

config_zero = pi_zero.create_still_configuration(main={'format': 'XBGR8888', 'size': (640,480)},raw=None)

With the above parameters, it is able to start. I can't remember exactly what resolution it stopped working at, but I can re-assemble if you think there's a possibility it can be fixed?

I read somewhere that it might be related to limited CMA memory. So it might be related to the hardware rather than the picamera2 software.

davidplowman commented 1 month ago

It's hard to see what's going on. CMA memory is actually much less of a constraint on a Pi 5 because it has an IO MMU. So far as we are aware, there are no specific constraints on the TDN (temporal denoise) buffer, so we're struggling to see where things would be going wrong. Again, it would be good to know if dmesg shows any errors.

The TDN buffer is allocated to match the size of the input image from the sensor, so it's whatever size has been selected for the raw stream. We assume from the message that it's this buffer that is causing the problem, but it's possible that something earlier is going wrong and it only shows up here. It might be worth trying to change the 64MP camera to select a non-full-resolution sensor mode. Assuming that pi_one is the camera in question, maybe try:

half_resolution = (pi_one.sensor_resolution[0] // 2, pi_one.sensor_resolution[1] // 2)
config_one = pi_one.create_still_configuration(raw={'size': half_resolution})

Also, it might be worth capturing a log file. This kind of thing should work:

LIBCAMERA_LOG_LEVELS=*:0 python script.py >& log.txt

The file would be quite large, I expect, so you might want to post it somewhere where we can download it.

I'm sorry not to have any particular answers at the moment, just lots of questions. I guess I'm wondering all kinds of things like, does the 64MP work on its own? Does it work in rpicam-apps? and so on.

ricardopretrazy commented 1 month ago

Hello,

This is the output from dmesg - I tried running my script twice to ensure that these entries relate to what I'm doing.

If I modify the code to use ½ resolution (as you suggest above) it then captures an image from both cameras. And the output from dmesg only includes the link rate messages.

davidplowman commented 1 month ago

OK, that's pretty interesting. It does suggest that some kind of buffer mapping in the MMU might be the problem. There is a lot of address space there, on the other hand these 64MP buffers can get very large indeed. I'll poke around a bit.

ricardopretrazy commented 1 month ago

I've produced the log files (as you suggest above) - I sent you a message on LinkedIn if you're happy to exchange emails, and then I can send over the logs? Like you said, they're quite big, but lots of messages come up when it fails.

davidplowman commented 1 month ago

Are you able to upload to Google Drive or somewhere like that? Otherwise LinkedIn is OK, so long as I recognise the connection request name! (I normally decline connections from folks I don't know because I get so many!!)

ricardopretrazy commented 1 month ago

camera-logs.zip

Actually, compressed, both logs are only 400k

davidplowman commented 1 month ago

Can you confirm that if you run the 64MP camera on its own, that it works?

Doing the sums, I can see that a 64MP + 12MP camera arrangement is starting to approach the limits in the IO MMU driver, but I'm not yet understanding why it seems actually to be going over. Certainly two 64MP cams is absolutely too much, but I'd have thought this configuration should squeeze through. More digging required.

ricardopretrazy commented 1 month ago

Yes, it works fine on its own, and even if with both cameras plugged in and I take two photos (one after the other) it also works at full resolution, but I'm trying to start both cameras, then wait for 'focus' on the 64mp one, then take two photos at the same time.

Just curious, if each pixel is 3 bytes, then the total bytes used for a 64mp camera would be 192mb, but this isn't even close to the 4gb of memory the Pi has, so why would 2x64mp cams be too much?? (or are some of my presumptions wrong?)

davidplowman commented 1 month ago

The IO MMU is limited to 2GB of allocation, and there are multiple buffers allocated behind the scenes. For example there are the ISP output buffers, there are the input buffers which are filled by the camera before being processed by the ISP, as well as (for example) buffers for dealing with temporal denoise and other things. Finally, because cameras are streaming asynchronously, we need multiple copies of these buffers so that we have "spare" ones to use while downstream parts of the system (or indeed the user's application) are still using the others. So there are way more buffers flying through the pipeline here than you would ever have imagined. In your particular use case, I can easily count over 1GB of allocation, maybe even close to 1.5GB including the 12MP device, but I'm struggling to see 2GB.

davidplowman commented 1 month ago

Hmm, we're thinking the problem is not the number of buffers we allocate as such, it's that buffers may get mapped more than once if different bits of the system are using them. This explains why the amount of address space we're gobbling up is larger than expected. We'll have to think about that.

davidplowman commented 1 month ago

As far as I can tell, it seems that the underlying V4L2 Linux framework is causing quite a few of the buffers to be mapped twice. Once when we "request" the buffers, and then again the first time they're "queued". I don't really understand why this would happen, and there's probably nothing we can really do about it either.

Having said that, some of our buffer allocations could actually be avoided, though there's a certain amount of work required to do that. Also, a good idea would be to let the IO MMU's page tables expand dynamically - but that needs to be done very carefully as otherwise you brick the entire system. So whilst we can put those things on our to do list, I don't think there's any immediate prospect of those happening.

In the short term, then, it might be worth trying some workarounds. I'd give something like this a go:

import cv2
import time

from picamera2 import Picamera2

pi_zero = Picamera2(0)
pi_one = Picamera2(1)

pi_zero.start()
pi_one.start()
time.sleep(0.5)
pi_zero.stop()
pi_one.stop()

config_zero = pi_zero.create_still_configuration({'format': 'YUV420'})
pi_zero.configure(config_zero)
config_one = pi_one.create_still_configuration({'format': 'YUV420'})
pi_one.configure(config_one)

pi_zero.start()
pi_one.start() # probably start the 64MP cam second

im_one_yuv = pi_one.capture_array('main') # capture and stop the 64MP cam first
pi_one.stop()

im_zero_yuv = pi_zero.capture_array('main')
pi_zero.stop()

pi_zero.close()
pi_one.close()

cv2.imwrite("first.jpg", cv2.cvtColor(im_zero_yuv, cv2.COLOR_YUV420p2RGB))
cv2.imwrite("second.jpg", cv2.cvtColor(im_one_yuv, cv2.COLOR_YUV420p2RGB))

There are two principal changes here.

I've run 0.5 seconds of preview before switching to the full resolution capture. When the camera starts it normally processes 6 or 7 frames to let AGC/AEC and AWB settle, which means it works its way through quite a lot of buffers, causing them all to be mapped. It's better if we use low resolution preview buffers for this purpose. When we switch to the capture, it should only need to process one full resolution buffer so that, as long as we stop the camera immediately, this should cause fewer large buffers to get mapped. (You can see I've re-arranged things a bit to leave the 64MP cam running as little time as possible.)
I'm using YUV420 buffers as these are half the size of the RGB888 ones. Unfortunately it does mean they have to be converted before saving, but at least this can happen "offline", and doesn't affect when the captures happen.

I've not tried this code, but hopefully it's obvious enough that you get the idea.

ricardopretrazy commented 1 month ago

Hello David,

Thank you for responding with the full example. I have tried it, and really like the idea of doing the processing 'off-line'.

However, there are still some issues with the capture, although probably unrelated to PiCamera2 and more the 64mp camera.

I mentioned before that I am able to capture full 64mp (9152 x 6944) images with this camera, but it's still not without issue.

If I capture 64mp, the auto-focus doesn't work. I send the relevant command and monitor post_callback for the LensPosition , but it just stays at 1.0 and the AfState remains at 1 / Auto - rather than switching to 2 / Focused (as it does on lower resolutions) - however, I can set the focus with the Manual mode, passing a LensPosition
When I capture YUV420, I get an image slightly wider than the largest image listed in the sensor modes, and there's a green stripe down the right hand side of the image (the returned image from imwrite (above) is 9216 x 6944. But the image quality seems similar to that of the standard capture.
I don't seem to be able to get AeMeteringMode working with anything other than spot or 'matrix'. I want to be able to pass a region/point (for the spot), although I think this might be another 'case' as it's not specifically linked to issues with two cameras.

I should have mentioned, that these issues are also present on a RP5 with only the 64mp camera.

I really appreciate your help with this issue, it's a learning curve for me, but a very interesting one.

davidplowman commented 1 month ago

Hi again, let me work through those various questions.

Autofocus

You might need to ask Arducam about that. It may be that they have code to support the autofocus but which they haven't upstreamed to our repository. We strongly encourage third parties to do this, but sometimes they don't for reasons of their own. This may mean that you have to install Arducam forks of our software but of course, it gets hard for us to support as we have no idea what they've changed. Obviously we'll always do our best to be helpful, but you would inevitably become more reliant on Arducam for some aspects of support.

YUV420 capture

This tends to happen because hardware often has constraints on how many bytes it can write to memory at once, and software engineers have historically not always taken this into account.

So on a Pi, the imaging system has to start every new image row on a multiple of 32 (Pi 4, I think) or 64 (Pi 5, IIRC) bytes. Because of the way Linux V4L2 works, this means the Y plane of the image, being double the width of the U and V, has to start at double that, thus 128 byte multiples on a Pi 5. This is what gives you those extra unused rows on the edge of the image - the image stride (line to line distance in memory measured in bytes) is rounded up to a multiple of 128.

If you're dealing with (for example) RGB, you can just slice of the dud pixels by creating a new "slice" of the array, you don't even need to copy the image data. This is nice and efficient. YUV420 is more awkard because of the way numpy/OpenCV stores the image. First you get H (the image height) rows of Y data, each with <= 127 bytes of dud pixels on the end (to make the stride up to a multiple of 128). Then you get H/2 rows of UV data, which is W/2 (W = image width) bytes of U, plus <= 63 bytes of dud pixels (corresponding to the dud Y pixels), then W/2 bytes of V with again <= 63 dud values.

It should be clear that there because we get dud values in the middle of the array's UV rows (a YUV420 image is stored as H+H/2 equally sized rows), there's no way to get rid of them without actually copying the entire array - which is wasteful and slow.

So the best solution is to turn this array - dud pixels and all - into RGB, and then perform the slice operation on the RGB array. The other solution would be to ask for images with a width satisfying these alignment constraints in the first place (which I guess would be 9088 pixels in this case). Does that make sense?

Metering

So the default (normal, or centre weighted). matrix (whole image average) and spot metering methods should all be supported. Unfortunately there's no libcamera API for setting exactly where the "spot" is, so the only place you could change it would be in the camera tuning file. This would take effect when you open the camera, and you couldn't change it once it's running.

Would that cover you use case? Let me know if you want to learn how to edit the camera tuning file; it's not difficult!

ricardopretrazy commented 1 month ago

Thank you so much for your comprehensive response.

Autofocus - this works on lower resolutions as expected, it's only on the full res that it doesn't do anything
YUV - will need to absorb this info
Metering - When you say 'open' the camera, is this at the point that you ;
- a) boot up the PI
- b) create an instance of PiCamera2
- c) 'start' the camera?

Thanks again

b)

davidplowman commented 1 month ago

Thank you so much for your comprehensive response.

Autofocus - this works on lower resolutions as expected, it's only on the full res that it doesn't do anything

Ah OK, that's a bit weird. I only have a Camera Module 3 here, but I've tried that and AfMode auto works as expected for me in the full resolution mode. Might be worth asking Arducam if they think this should work.

YUV - will need to absorb this info

Metering - When you say 'open' the camera, is this at the point that you ;

a) boot up the PI

b) create an instance of PiCamera2

I think this. I'm not 100% sure if you need to quit the Python process as well and restart it, but you certainly shouldn't need a reboot.

c) 'start' the camera?

Thanks again

b)

raspberrypi / picamera2

Pi5 with two cameras, maximum resolution causes error when 'starting' both #1035