Closed giannisbdk closed 1 year ago
Hi @johnbdk , The result of ResnetEncoder is not directly passed to PoseDecoder, PoseDecoder typically receives a list of tensor[N, 512, 1/32s, 1/32s] features. See: https://github.com/nianticlabs/monodepth2/blob/master/trainer.py#L262
Please try running the code and specify what options you are using.
@daniyar-niantic Well it seems that I did not see the outer square brackets in line https://github.com/nianticlabs/monodepth2/blob/master/trainer.py#L285 when investigating the code.
Bellow, I point out the exact line and its output in case any future reader happens to have the same query.
pose_inputs = [self.models["pose_encoder"](torch.cat(pose_inputs, 1))]
# My comment:
#
# This leaves an output of the encoder's forward method (when having)
# num_pose_frames == 2, and pose_model_type == 'shared', which means
# one frame at a time for the input of the decoder) like:
#
# list_of_list = [ [ tensor[N, 64, 1/2*s, 1/2*s],
# tensor[N, 64, 1/4*s, 1/4*s],
# tensor[N, 128, 1/8*s, 1/8*s],
# tensor[N, 256, 1/16*s, 1/16*s],
# tensor[N, 512, 1/32*s, 1/32*s ] ]
# where s: original image size h, w, resp.
Thank you for the clarification!
Hello, I would like to follow this issue #418 up, since it is closed and not answered in detailed .
Indeed, the output of the resnet encoder is a list containing the intermediate outputs of all the resnet's stages:
This list of features is, then, fed into the (pose) decoder network, where the forward method is doing the following:
However, I cannot see how the
self.relu(self.convs["squeeze"](f))
can work, since this was declared in the constructor like:It seems that in the
PoseNetwork
class constructor a convolutional operation is declared that waits an input volume with # input channels as of the resnet's last stage (which is indeed as of the paper, see Table 5), i.e. 512. However, in my point of view, there is a mismatch between the pose network introduced in the paper, and the actual implementation regarding the forward method and the first convolutional operation. (?)Please, note that that I did not run the code, thus I cannot provide you with any other information. I have read the paper and tried to investigate the code. Also, I do not assume that my analysis is correct (I just described a point of view).