[Help] Starting point for using rtabmap_ros with multiple monocular camera

aaalloc commented 8 months ago

Hi, I have multiple questions.

I'm currently doing a project where I have multiple IP cameras and I need to be detect and locate objects. Is rtabmap suitable for this ?

I've seen this repository : https://github.com/ROBOTIS-JAPAN-GIT/turtlebot3_slam_3d and this is exactly what I need to do, but with multiple stationary cameras (pan/tilt is possible). I've seen that it is possible to only use RGB and depth image according to this https://github.com/introlab/rtabmap/issues/1071.

My goal now is to be able to make things work with only one camera and see if I can expand it later but is that possible to do with rtabmap_ros ? If yes, is it possible to give some example of how doing that ? I've understand that i need to give topics to these fields : camera_info_topic, depth_topic, rgb_topic and I have already prepared a ROS node to publish Image message, its just that don't really understand how can I plug things together to make it work.

matlabbe commented 8 months ago

Can you detail more what would be the setup? Would there be multiple static monocular cameras on pan/tilt servo around a room? Not sure rtabmap would ideal, as rtabmap is targeting primarily mobile robots/cameras by default. Do you need a 3D reconstruction or just detecting/tracking 3D pose of objects in the space?

For the title, you cannot use multiple monocular cameras unless you can generate depth for them somehow (from depth cameras or lidar).

aaalloc commented 8 months ago

Would there be multiple static monocular cameras on pan/tilt servo around a room?

Yes that's the idea

just detecting/tracking 3D pose of objects in the space?

Yes, when a object is detected and localised other cameras needs to be aware of that too. I thought that maybe for doing that I would need to have a cloud map made from distinct cameras and be shared across all cameras but I don't know ... Another idea was to do Stereo depth mapping, but I have no leads to do that for more than 2 cameras

you cannot use multiple monocular cameras unless you can generate depth for them somehow (from depth cameras or lidar).

I can somehow manage to generate depth cameras with AI Model like Depth-Anything or ZoeDepth but that would be expansive, so I thought maybe there is a way of doing that without it by the fact that I can have multiple view from a room

matlabbe commented 8 months ago

To reconstruct 3D space from multiple static monocular cameras, that can be an hard/expensive task. The system I am thinking requires > 10 cameras for quite small volume. They would do photogrammetry offline or even real-time, but many point of views looking the same thing are required.

That's why I asked if you need 3D reconstruction or just tracking. For tracking, the system could look like more like OptiTrack or VICON systems can do, though they require specific targets (like small spheres used for motion capture). Once the cameras know where they are relative from each other in a space, you can then track the same object across them.

AI depth with monocular camera would give you depth but without scale. With stereo cameras, you would get true depth. If you place 3 stereo cameras on pan/tilt around a room, after you give the relative TF between all of them, you would not need a SLAM package as you would know the global position/rotation of each camera at any moment in the the same global frame. To scan in 3D, you may just accumulate the point clouds while the cameras are rotating. The most difficult part is to correctly calibrate the extrinsics between all the stereo cameras (i.e., their accurate relative position), so that point cloud generate from one camera can overlaps correctly with point cloud computed from another camera looking at the same thing.

aaalloc commented 8 months ago

With stereo cameras, you would get true depth.

Unfortunately I can't replace monocular cameras to stereos cameras, but, do you think it is feasible to make stereo camera based on 2 monocular cameras far apart like this ?

matlabbe commented 8 months ago

Theoretically yes, though practically difficult to setup and calibrate.

This is what I was thinking you were doing (tracking an object from multiple monocular cameras in a room):

In that kind of setup (static monocular cameras), you cannot reconstruct the object in 3D, unless you have a prior knowledge of the shape of objects detected or sufficient overlap between cameras field of views (like photogrammetry example I cited above).
You can potentially track in 3D, if the relative camera position/rotation is known. However, the challenge to be able to triangulate the object from multiple cameras is to correctly match a feature on the object that can be seen by 2 or more cameras at the same time. For example, if cam2 sees the back of the chair, but cam3 sees only the front, there could be no feature corresponding between each point of view.
If you know the height of the camera and can see the floor below an object detected, you could estimate its distance.

Interesting survey: "Multi-camera multi-object tracking: A review of current trends and future advances" https://www.sciencedirect.com/science/article/pii/S0925231223006811

introlab / rtabmap_ros

[Help] Starting point for using rtabmap_ros with multiple monocular camera #1123