nianticlabs / manydepth

[CVPR 2021] Self-supervised depth estimation from short sequences
Other
618 stars 83 forks source link

Input channels for input_features for PoseDecoder #41

Open benhgm opened 3 years ago

benhgm commented 3 years ago

Hi, I got a question regarding the input_features data for PoseDecoder network.

From the line below, the PoseDecoder accepts an input feature with number of channels equal to self.num_ch_enc[-1], which according to the ResnetMultiImageInput encoder, should be 512. self.convs[("squeeze")] = nn.Conv2d(self.num_ch_enc[-1], 256, 1)

However, the output features of the ResnetEncoder have the following shapes, which means that only the last element of the features array is accepted by the PoseDecoder?: torch.Size([1, 64, 320, 96]) torch.Size([1, 64, 160, 48]) torch.Size([1, 128, 80, 24]) torch.Size([1, 256, 40, 12]) torch.Size([1, 512, 20, 6])

Perhaps I am reading the code wrongly, so I appreciate if anyone could explain if to me. Thank you so much!