Closed zhyever closed 3 years ago
our bts is
self.decoder = bts(params, [64, 128, 256, 512, 1024], params.bts_size)
our bts is
self.decoder = bts(params, [64, 128, 256, 512, 1024], params.bts_size)
I wonder why the ResNet50 outputs are 64, 128, 256, 512, 1024 channels? 64, 256, 512, 1024, 2048 one is more standard. Since I just refer to your codes at TransDepth/pytorch/bts.py, line 347, is that means the TransDepth bases on the ResNet50 whose outputs are 64, 256, 512, 1024, 2048 channels and, as you say, the baseline bases on the ResNet50 whose outputs are 64, 128, 256, 512, 1024 channels?
Sorry to the border. I got the reason! Thank you a lot for explaining in detail.
Sorry to the border. I got the reason! Thank you a lot for explaining in detail.
Hi zhyever, Same question here, what's the reason? and it seems the decoder is actually the size of [64, 256, 512, 1024, 2048] rather than [64, 128, 256, 512, 1024].
Sorry to the border. I got the reason! Thank you a lot for explaining in detail.
Hi zhyever, Same question here, what's the reason? and it seems the decoder is actually the size of [64, 256, 512, 1024, 2048] rather than [64, 128, 256, 512, 1024].
I sent you an email. Sorry for the long reply interval.
Sorry to the border. I got the reason! Thank you a lot for explaining in detail.
Hi zhyever, Same question here, what's the reason? and it seems the decoder is actually the size of [64, 256, 512, 1024, 2048] rather than [64, 128, 256, 512, 1024].
In order to fit our Resnet-50's output size, we change the decoder parameter.
Hi, thanks for the great work. When I read your paper, I find: "We choose the ResNet-50 with the same prediction head as our baseline", but there are no words about the "decoder head" design, so I come to GitHub to figure it out. I find your method bases on BTS and uses its decoder:
https://github.com/ygjwd12345/TransDepth/blob/3ae116f045243f24c72a4fc558634d0cf823fd1b/pytorch/bts.py#L347
So, "We choose the ResNet-50 with the same prediction head as our baseline" means you replace the BTS encoder with ResNet-50, and preserve other setting the same. I recently reproduced the BTS with their official code, so I am a little bit familiar with its quantitative results. Although the result of the baseline on the NYU dataset is similar to the one reported in BTS, when it comes to the KITTI, I find that your baseline result is much lower than the one reported in BTS. As follows:
NYU: (Abs rel, RMSE, a1, a2, a3) Your report: 0.118 0.414 0.866 0.979 0.995 (TransDepth, Table.2, Baseline) BTS report: 0.119 0.419 0.865 0.975 0.993 (BTS, Table. 5, ResNet-50)
KITTI: (Abs rel, RMSE, a1, a2, a3) Your report: 0.106 3.981 0.888 0.967 0.986 (TransDepth, Table.1, Baseline) BTS report: 0.061 2.803 0.954 0.992 0.998 (BTS, Table. 6, ResNet-50)
May I ask if I misunderstood, or did you use a different setting from the BTS?