Closed AndrasZabo closed 8 months ago
The program basically works by making a guess of every camera location and every tag location. Once it has these guesses, it creates a virtual camera for every image and makes virtual detections for every tag, and sees how well the virtual detections match the tag positions provided. This is a classic SLAM pose graph architecture.
The get_avg_detection_error is the pixel difference, on average, between the virtual detections and the actual detections. The optimizer works by jiggling all the guesses so that the virtual detections match up with the actual detections better.
Recently, some users have been reporting large errors and lack of convergence like yours. Could you tell me what type of camera you are using? It is very important for inputs to the pytagmapper to be undistorted and calibrated. Additionally, it is important that the camera does not have any sort of autofocus which could change the camera parameters while you are taking your images.
Thanks for the quick reply, and sorry for the late answer on my side, I was busy with other projects, sorry for that. So:
I'm using a Logitech C270 webcam. I did the calibration (based on that link) so I have the camera matrix, and also undistort the frames before doing any further analysis, detection, etc. I do that using this function: "cv2.undistort(frame, camera_matrix, distortion_coefficients)". At the moment I'm testing other cameras.
Regarding the first section of your answer, am I right that it works like this:
Finally, could you please give me some insights what might be crucial points when selecting frames from a video stream to analyze? E.g. how different or similar the angle should be (frames from similar or different viewpoints), how many markers should be detected on one frame for successfully building the map, how many frames do I need about one marker, etc?
Thanks a lot!
To match up with common terminology in SLAM literature, you just need to know that in pytagmapper, the "robot poses" are camera positions, and the "landmarks" are tag positions. The "factors" correspond to tag detections, one for each detected tag in each image.
Thanks for the reply! At the moment I'm testing different cameras and they seem to work better. I'll reply in detail as I get results. Thanks!
Hello markisus! Using better camera solved part of the problem, however I added a loop which searches for images which get stuck, and after removing them from the input for the map building, it executes nicely and builds the map. Thanks for your help, and the nice project!
Hello markisus, Thanks for your nice project! I started to work with your code (pytagmapper_tools/bulit_map.py and show_map.py) and have some difficulties. Hopefully you can help with some answers. Here I explain what I'm doing:
Do you have any idea where the problem lies? Also, can you please give an explanation or describe what get_avg_detection_error does, what this error means?
Thanks a lot!