kxhit / vMAP

[CVPR 2023] vMAP: Vectorised Object Mapping for Neural Field SLAM
https://kxhit.github.io/vMAP
Other
325 stars 20 forks source link

Running iMAP demo #11

Closed raabuchanan closed 1 year ago

raabuchanan commented 1 year ago

Hi, thanks for open sourcing your code!

I'm just trying to get iMAP running and I have two questions:

  1. I remember a live demo of iMAP at CoRL 2021 and in the paper I believe it says it can run online. But I find in this implementation the meshing step is very slow (it takes about 30 seconds). Why is this and how can I get iMAP working online?
  2. Can you explain my gt_depth and gt_rgb are always randomized images with different dimensions from the input? I thought the loss was supposed to be geometric and photometric error between the latest image and an image predicted by the network.

Thank you

kxhit commented 1 year ago

Hi, thanks for your interest in our work!

  1. Yeah, it is achieved by pytorch multiprocessing, in the live demo, we have a visualization thread running, taking the newest map (MLP) send from the mapping thread and rendering 2D & 3D results. We didn't optimise the marching cube implementation which could be potentially more efficient by adopting pytorch3d. The released code is a single thread one for simplicity.
  2. I'm not getting the question for "my gt_depth and gt_rgb are always randomized images with different dimensions from the input". The loss is the depth, RGB, and "obj mask" error between the rendering and the GT. And we randomly sample pixels from the keyframe buffer which always includes the latest one. The reason behind this is to keep the memory of the historical observation to avoid forgetting.

Please let me know if you need further help!

raabuchanan commented 1 year ago

Thank you for the response, I'll just clarify what I mean for question 2.

Basically I wanted to visualize the depth and RGB images used for computing loss. When I look at gt_depth and gt_rgb by adding the following code:

                plt.subplot(2, 2, 1)
                plt.title('gt rgb image')
                plt.imshow(gt_rgb.cpu())
                plt.subplot(2, 2, 2)
                plt.title('gt depth image')
                plt.imshow(gt_depth.cpu())
                plt.subplot(2, 2, 3)
                plt.title('rgb image')
                plt.imshow(rgb.cpu())
                plt.subplot(2, 2, 4)
                plt.title('depth image')
                plt.imshow(depth.cpu())
                plt.show()

I get the attached output which shows garbled images for gt_depth and gt_rgb. I would have expected images generated by the MLP to look closer to the input images. Figure_1

kxhit commented 1 year ago

The training samples are obtained from function get_training_samples. The gt pixels are sampled from a subset of pixels (number = cfg.n_samples_per_frame) from training frames. And the training frames (number = cfg.win_size) are sampled from a keyframe buffer. Therefore, the visualisation of training samples wouldn't look like an image, and will actually be a group of pixels from the historical observations instead. If you want to visualise a rendering image with the gt rgb, you need to render a whole image by a given pose.

kxhit commented 1 year ago

And reduce the mesh grid resolution here will also speed up the meshing speed.

raabuchanan commented 1 year ago

Ah ok I think I understand now, thanks