saimwani / multiON

Code for reproducing the results of NeurIPS 2020 paper "MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation”
46 stars 7 forks source link

function:ProjectToGroundPlane #12

Closed Hoyyyaard closed 2 years ago

Hoyyyaard commented 2 years ago

hello,can u inform me some details about the function ProjectToGroundPlane() ,or what is the purpose of this function thank u!

shivanshpatel35 commented 2 years ago

Hi, ProjectToGroundPlane takes two inputs -- 1. Convolutional output of dimension (h, w, c) and 2. For each of the spatial locations (h, w) in the convolutional output, it takes the corresponding egocentric grid location. Using these, it projects these image features onto the egocentric grid. High-level overview of this process is shown in figure 2 in the MultiON paper.

Hoyyyaard commented 2 years ago

thanks for your answers! But I still have some questions: 1 Are there any demo which uses the Class Projection? 2 Does the output of calss MapCNN contribute to the first input of ProjectToGroundPlane (conv) ; 3 Does the function "rotate_tensor" which after the "ProjectToGroundPlane " corresponding to the process of egocentric map register to global map in figure 2 in the MultiON paper.

shivanshpatel35 commented 2 years ago
  1. We don't have any demo for Projection.
  2. ProjectToGroundPlane projects visual features onto the top-down grid. This top-down grid (or map) is then passed through MapCNN to obtain map features v_m (refer to figure 1 in the paper).
  3. rotate_tensor is used two times. First, it is used to convert egocentric map to global frame for registration (as seen in figure 2). And then it is again used to obtain egocentric map m_t from global map M_t (refer to figure 1).
Hoyyyaard commented 2 years ago

ok thanks! so which function can I get the image features from rgbd ?

shivanshpatel35 commented 2 years ago

This line gets image features from RGBD.

Hoyyyaard commented 2 years ago

I understand most of the Projection through your help.Thank u very much ! And I am not sure about the parameter "heading" in the function"RotateTensor" .Does it mean the forward side of the agent ? and what is the size of it?

shivanshpatel35 commented 2 years ago

Heading is the angle of the agent in the top-down view. Consider it as Yaw in 3-D rotation.

Hoyyyaard commented 2 years ago

So which function can I get the heading ? From a measurement/metric in habitat?

Hoyyyaard commented 2 years ago

And the proj_feats from ProjecttoGroundPlane has a shape of (bs,32,256,256) how can I visualize it through cv2.imshow? or how can I transform it to 3 channels or 1 channel?

shivanshpatel35 commented 2 years ago

Heading is obtained from here. proj_feats has the shape (bs, 32, 28, 28). One way to visualize it is to sum over the channel dimension. This will give you tensor of shape (bs, 1, 28, 28). You may now visualize this (28, 28, 1) tensor as a grayscale image or heatmap.