the preatrain model load issue

wgcban / ChangeFormer

[IGARSS'22]: A Transformer-Based Siamese Network for Change Detection

https://www.wgcban.com/research#h.e51z61ujhqim

MIT License

427 stars 57 forks source link

the preatrain model load issue #70

Closed money6651626 closed 1 year ago

money6651626 commented 1 year ago

Hi, I found a question that your pretrain model parameters（on ADE datasets） seem to no useful because most of the parameters do not match(the weight shape). Althought they have the same Key and load_state_dict(torch.load(self.args.pretrain), strict=False) allow it. I caculate the number of pretrain parameters which can be loaded correctly about 6 items.

wgcban commented 1 year ago

@money6651626 Hi, Thanks for your question. I agree with your observation - a few others also raised this issue in the past, and I would like to bring your attention that, all the models provided here did not utilize the model pre-trained on ADE dataset. Instead, we first trained the model on LEVIR dataset starting from random initialization, and then used this model as the pre-trained model when fine-tuning on other datasets such as DSIFN to speed up the training. If you are training the model on separate dataset, you can use any of the pre-trained models provided here as the starting point which will help the faster convergence. Thanks.

money6651626 commented 1 year ago

Sir, Are you referring to the fact that the excellent indicators on the DSIFN-CD dataset in the paper were fine-tuned on the ChangeFormerV6 version using the training results on the LEVI-CD dataset? If so, this might not be a fair comparison for other model.

wgcban commented 1 year ago

https://github.com/wgcban/ChangeFormer/releases/download/v0.1.0/CD_ChangeFormerV6_LEVIR_b16_lr0.0001_adamw_train_test_200_linear_ce_multi_train_True_multi_infer_False_shuffle_AB_False_embed_dim_256.zip