qianyu-dlut / MVANet

MIT License
60 stars 7 forks source link

inf_MCRM and MCRM weights names discreptancy #3

Open piercus opened 3 months ago

piercus commented 3 months ago

@qianyu-dlut thanks for this great work, i was having one more question regarding MCRM module.

MCRM naming is linear1 and linear2 here

inf_MCRM naming is linear3 and linear4 here

In Model_80.pth, there are 4 different layers linear1 / linear2 / linear3 / linear4

dec_blk1.linear1.weight
dec_blk1.linear1.bias
dec_blk1.linear2.weight
dec_blk1.linear2.bias
dec_blk1.linear3.weight
dec_blk1.linear3.bias
dec_blk1.linear4.weight
dec_blk1.linear4.bias

and the values are different :

import torch

pretrained_dict = torch.load("./saved_model/MVANet/Model_80.pth", map_location='cuda')
print('dec_blk1.linear1.weight', torch.sum(pretrained_dict['dec_blk1.linear1.weight']))
print('dec_blk1.linear3.weight', torch.sum(pretrained_dict['dec_blk1.linear3.weight']))

outputs

dec_blk1.linear1.weight tensor(2.0187, device='cuda:0')
dec_blk1.linear3.weight tensor(-0.5632, device='cuda:0')

What is the difference between linear1 and linear3 ?

Thanks for your help

htrvu commented 3 months ago

The pretrained weights also have some strange keys, e.g. multifieldcrossatt.attention.5, dec_blk1.attention.4, when the model state dict does not have those keys (multifieldcrossatt only has attention.0 to 4). Was the pretrained weights be trained at a larger model architecture?

qianyu-dlut commented 3 months ago

Hi @piercus ! Thank you for bringing this to our attention. There were indeed two additional linear initializations when training the Model_80.th, namely linear1 and linear2, which were not utilized in the subsequent operations. This oversight has led to the situation you've observed.

Furthermore, you are correct regarding the naming convention within the MCRM; linear1 and linear2 should indeed be renamed to linear3 and linear4 to maintain consistency throughout the codebase. We appreciate your keen eye on this matter.

qianyu-dlut commented 3 months ago

Hi @htrvu ! Thank you for your careful observation regarding the pretrained weights. There were additional attention modules initialized during the training phase. These modules, however, were not utilized in the forward pass and, as a result, they were omitted from the code for the sake of simplicity and clarity. We apologize for any confusion this may have caused.