Open ivan-ushakov opened 1 month ago
I don't have a lot of experience with libcamera, so I'm not sure if I can be of any help. However, the first thing to try is to compile everything with the address and undefined behavior sanitizers enabled.
You could also try placing a hardware watchpoint on the refcount to see where it is decremented from 2 to 1.
Also, can you be sure that the camera isn't removed from another thread? Do you actually see two destructor calls for the same camera object?
It looks like memory layout problem for shared_ptr
type. I use toolchain GCC 12.4 on my Linux machine but Raspberry Pi Zero has GCC 12.2 and maybe this could be a reason.
For example in disassemble of camera_added
I don't see shared_ptr
counter manipulation methods at all:
Thread 4 "camera-bot" hit Breakpoint 3, camera::CameraService::Context::camera_added (camera=std::shared_ptr<libcamera::Camera> (use count 1, weak count -1) = {...}) at /workspaces/RaspberryCamera/src/camera_service.cpp:102
102 logger::debug(fmt::format("CameraService: camera added with use_count={:d} p={:x}", camera.use_count(),
(gdb) disassembly
Undefined command: "disassembly". Try "help".
(gdb) disassemble
Dump of assembler code for function _ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE:
=> 0x00058ec8 <+0>: vldr d7, [pc, #152] @ 0x58f68 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+160>
0x00058ecc <+4>: push {r4, lr}
0x00058ed0 <+8>: ldr r3, [r0, #4]
0x00058ed4 <+12>: sub sp, sp, #80 @ 0x50
0x00058ed8 <+16>: cmp r3, #0
0x00058edc <+20>: ldrne r3, [r3, #4]
0x00058ee0 <+24>: mov r2, #54 @ 0x36
0x00058ee4 <+28>: vstr d7, [sp, #40] @ 0x28
0x00058ee8 <+32>: ldr r1, [pc, #128] @ 0x58f70 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+168>
0x00058eec <+36>: str r3, [sp, #24]
0x00058ef0 <+40>: str r1, [sp, #16]
0x00058ef4 <+44>: ldr r3, [r0]
0x00058ef8 <+48>: add r1, sp, #24
0x00058efc <+52>: str r1, [sp, #48] @ 0x30
0x00058f00 <+56>: add r4, sp, #40 @ 0x28
0x00058f04 <+60>: str r2, [sp, #20]
0x00058f08 <+64>: str r3, [sp, #32]
0x00058f0c <+68>: ldm r4, {r0, r1, r2, r3}
0x00058f10 <+72>: stm sp, {r0, r1, r2, r3}
0x00058f14 <+76>: add r12, sp, #16
0x00058f18 <+80>: ldm r12, {r1, r2}
0x00058f1c <+84>: add r0, sp, #56 @ 0x38
0x00058f20 <+88>: bl 0x67d60 <_ZN3fmt3v117vformatB5cxx11ENS0_17basic_string_viewIcEENS0_17basic_format_argsINS0_7contextEEE>
0x00058f24 <+92>: ldrd r2, [sp, #56] @ 0x38
0x00058f28 <+96>: str r2, [sp, #44] @ 0x2c
0x00058f2c <+100>: str r3, [sp, #40] @ 0x28
0x00058f30 <+104>: ldm r4, {r0, r1}
0x00058f34 <+108>: bl 0x59ce4 <_ZN6camera6logger5debugESt17basic_string_viewIcSt11char_traitsIcEE>
0x00058f38 <+112>: ldr r0, [sp, #56] @ 0x38
0x00058f3c <+116>: add r3, sp, #64 @ 0x40
0x00058f40 <+120>: cmp r0, r3
0x00058f44 <+124>: beq 0x58f54 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+140>
0x00058f48 <+128>: ldr r1, [sp, #64] @ 0x40
0x00058f4c <+132>: add r1, r1, #1
0x00058f50 <+136>: bl 0x1d910 <_ZdlPvj@plt>
0x00058f54 <+140>: add sp, sp, #80 @ 0x50
0x00058f58 <+144>: pop {r4, pc}
0x00058f5c <+148>: add r0, sp, #56 @ 0x38
0x00058f60 <+152>: bl 0x1dc34 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@plt>
0x00058f64 <+156>: bl 0x1de38 <__cxa_end_cleanup@plt>
0x00058f68 <+160>: andeq r0, r0, r1, lsr #32
0x00058f6c <+164>: andeq r0, r0, r0
0x00058f70 <+168>: andeq r6, r8, r4, lsl r7
End of assembler dump.
Also it shows wrong information about use count and weak count
I use toolchain GCC 12.4 on my Linux machine but Raspberry Pi Zero has GCC 12.2 and maybe this could be a reason.
If there's an ABI break in shared_ptr
between GCC 12.2 and 12.4, then that would be a GCC bug, so this is unlikely to be the reason.
For example in disassemble of
camera_added
I don't seeshared_ptr
counter manipulation methods at all:
That's correct: the caller is responsible for constructing and destructing the function arguments. https://itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial-parameters
I guess I found problem. Could you tell me how this flag is configured (opt/x-tools/armv6-rpi-linux-gnueabihf/armv6-rpi-linux-gnueabihf/include/c++/12.4.0/armv6-rpi-linux-gnueabihf/bits/c++config.h
)?
/* Defined if shared_ptr reference counting should use atomic operations. */
#define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1
As I understand if some library located on my device is compiled with _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 0
then it will use __default_lock_policy = _S_mutex
and this could be the reason of strange behaviour of my application
Good catch!
It looks like Debian uses _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY=1
on armhf (which appears to be GCC's default), but Raspberry Pi OS does not ...
I'll have to investigate further.
The difference appears to be that Raspberry Pi OS is compiled using the armv6
architecture, whereas the ARM1176JZF-S processor used by the BCM2835-based RPis uses the armv6kz
architecture (ARMv6 with ARMv6k multiprocessor and TrustZone Security Extensions). So even though the hardware supports atomic instructions, they are disabled in Raspberry Pi OS, and this explains the differences in the implementation of shared_ptr
.
Since the toolchains in tttapa/docker-arm-cross-toolchain compile for the ARM1176JZF-S processor specifically, they implicitly have atomic operations enabled. The fix is simple: build for a generic ARMv6 CPU without any extensions, perhaps with -mtune=arm1176jzf-s
. I hope to push a fix soon.
This will result in a performance penalty if your code uses atomic operations (such as shared_ptr
reference counting), but it is the only way to ensure binary compatibility with packages compiled for Raspberry Pi OS.
I've created a new release for https://github.com/tttapa/docker-arm-cross-toolchain. The new toolchains are currently being built.
Great! I hope to test new toolchain during next week and after that I close this issue
Hello, First of all, thank you for providing this cross compile toolchains for Raspberry Pi. I use Raspberry Pi Zero (without W) and I faced with problem I cannot solve. I followed your guide and I can successfully build and run project with libcamera. Problem happens with
shared_ptr
type, it has wrong number ofuse_count
. For example, this code gives 1 inuse_count
but must be two:std::shared_ptr<libcamera::Camera>
insidecamera_added
has wrong number of use count and when execution left this function it destroyslibcamera::Camera
object because counter became zero. It looks likeshared_ptr
on Linux machine used for cross compilation is not comparable withshared_ptr
on device, but I don't understand how this is possible since GCC should have ABI.I tried different version of GCC from you: 14 and 12. Problem is the same. libcamera version is the same on device and Linux machine (I use
mk_sbuild
)