Open ionut-grigore99 opened 11 months ago
The args_files/hisfog/kitti/effb5_320x1024.txt
is for KITTI (Efficient-b5). So which args_file did you use?
I have same issue. I used args_files/hisfog/kitti/effb5_320x1024.txt and below is my args.
--load_pretrained_model --load_pt_folder /SfMNeXt-Impl/models/pretrained/KITTI_effb5_320x1024 --image_path /SfMNeXt-Impl/images/231121_E100#3_KATRI/1m_10_start6.jpg --log_dir /SfMNeXt-Impl/logs --model_name effb5_320x1024 --dataset kitti --eval_split eigen --backbone tf_efficientnet_b5_ap --height 320 --width 1024 --batch_size 16 --num_epochs 25 --scheduler_step_size 15 --model_dim 32 --patch_size 32 --dim_out 128 --query_nums 128 --dec_channels 512 256 128 64 32 --min_depth 0.001 --max_depth 80.0 --diff_lr --use_stereo --eval_mono --post_process --save_pred_disps
below is an error message
Traceback (most recent call last):
File "test_simple_SQL_config.py", line 254, in
RuntimeError: mat1 dim 1 must match mat2 dim 0
What's your input image size? Does H*W / patch_size^2 >= query_nums ?
I tested it with images of several sizes. (1920 1080 ~ 1024 320) All of images are H*W / patch_size^2 >= query_nums. I am analyzing this error, and I found a layer's input and output size is different.
networks/depth_decoder_QTR.py line 22 self.bins_regressor = nn.Sequential(nn.Linear(embedding_dimquery_nums, 16 query_nums), nn.LeakyReLU(), nn.Linear(16 query_nums, 16 16), nn.LeakyReLU(), nn.Linear(16 * 16, dim_out))
line 54 y = self.bins_regressor(summarys.view(bs, Q * E))
Q E is not same with embedding_dimquery_nums.
Thank you for your quick response.
class DecoderBN(nn.Module): def init(self, num_features=2048, num_classes=1, bottleneck_features=2048): super(DecoderBN, self).init() features = int(num_features)
self.conv2 = nn.Conv2d(bottleneck_features, features, kernel_size=1, stride=1, padding=1)
self.up1 = UpSampleBN(skip_input=features // 1 + 112 + 64, output_features=features // 2)
self.up2 = UpSampleBN(skip_input=features // 2 + 40 + 24, output_features=features // 4)
self.up3 = UpSampleBN(skip_input=features // 4 + 24 + 16, output_features=features // 8)
self.up4 = UpSampleBN(skip_input=features // 8 + 16 + 8, output_features=features // 16)
self.up5 = UpSampleBN(skip_input=features // 16 + 3, output_features=features//16) #
self.conv3 = nn.Conv2d(features // 16, num_classes, kernel_size=3, stride=1, padding=1)
# self.act_out = nn.Softmax(dim=1) if output_activation == 'softmax' else nn.Identity()
def forward(self, features):
x_block0, x_block1, x_block2, x_block3, x_block4 = features[4], features[5], features[6], features[8], features[11]
x_d0 = self.conv2(x_block4)
x_d1 = self.up1(x_d0, x_block3)
x_d2 = self.up2(x_d1, x_block2)
x_d3 = self.up3(x_d2, x_block1)
x_d4 = self.up4(x_d3, x_block0)
x_d5 = self.up5(x_d4, features[0]) #
out = self.conv3(x_d5) #
# out = self.conv3(x_d4) #
# out = self.act_out(out)
# if with_features:
# return out, features[-1]
# elif with_intermediate:
# return out, [x_block0, x_block1, x_block2, x_block3, x_block4, x_d1, x_d2, x_d3, x_d4]
return out
My problem is that when I simply try to load the pretrained weights provided on repo, it seems that the keys for BaseEncoder don't match the keys from provided weights. I just did this:
model = BaseEncoder.build(num_features=256, model_dim=32)
model.from_pretrained(weights_path='/home/Desktop/SQLdepth/src/pretrained/KITTI_EfficientNetB5_320x1024/encoder.pth', device='cpu')
where I have this:
def from_pretrained(self, weights_path, device='cpu'):
loaded_dict_enc = torch.load(weights_path, map_location=device)
filtered_dict_enc = {k: v for k, v in loaded_dict_enc.items() if k in self.state_dict()}
self.load_state_dict(filtered_dict_enc)
self.eval()
My problem is that when I simply try to load the pretrained weights provided on repo, it seems that the keys for BaseEncoder don't match the keys from provided weights. I just did this:
The pretrained KITTI efficient-b5 model does not use BaseEncoder as backbone, it uses Unet (--backbone tf_efficientnet_b5_ap).
So you should use args_files/hisfog/kitti/effb5_320x1024.txt
for KITTI efficient-b5 model.
And basically in this case the shape of the feature map from the EfficientNet encoder will be (c, h, w) or (c, h/2, w/2) ? It seems that the first shape is printed now.
class DecoderBN(nn.Module): def init(self, num_features=2048, num_classes=1, bottleneck_features=2048):
The KITTI Efficient-b5 does not use this DecoderBN. The encoder should be Unet, NOT BaseEncoder, and NO DecoderBN,.
self.encoder = networks.Unet(pretrained=(not opt.load_pretrained_model), backbone=opt.backbone, in_channels=3, num_classes=opt.model_dim, decoder_channels=opt.dec_channels)
@ionut-grigore99 @Choi-YeongJoon
It works fine now, but the resolution of the resulting feature map appears to be the same as the input resolution. In contrast, for ConvNeXt and ResNet, the resolution is halved, as claimed in paper. I just want to know if it's the expected behaviour or not, thanks!
It works fine now, but the resolution of the resulting feature map appears to be the same as the input resolution. In contrast, for ConvNeXt and ResNet, the resolution is halved, as claimed in paper. I just want to know if it's the expected behaviour or not, thanks!
Yes, that's expected!
Now, It works fine, too. Thanks a lot ! @hisfog
Hi!
When I attempt to load the pretrained weights you provided for EfficientNetB5, there appear to be some mismatches between the keys in the state_dict. Loading the weights was quite straightforward for ResNet50 and ConvNeXt, but this was not the case with EfficientNetB5