w1oves / Rein

[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
https://zxwei.site/rein
GNU General Public License v3.0
215 stars 19 forks source link

Issue when training on my own binary segmentation dataset #24

Closed crystalline02 closed 5 months ago

crystalline02 commented 5 months ago

Hello,

Thank you very much for your paper and the open-source code. I've encountered an issue when attempting to use REIN on my own dataset and was hoping you could assist me in resolving it. I aim to apply REIN to my binary segmentation dataset, so I modified the num_classes to 2 in configs\_base_\models\rein_dinov2_mask2former.py. Following your instructions, I trained the model using the command: python tools/train.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py However, I encountered a size mismatch issue:

size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([1024, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([1024, 1, 16, 16]). missing keys in source state_dict: reins.scale, reins.learnable_tokens_a, reins.learnable_tokens_b, reins.mlp_token2feat.weight, reins.mlp_token2feat.bias, reins.mlp_delta_f.weight, reins.mlp_delta_f.bias, reins.transform.weight, reins.transform.bias, reins.merge.weight, reins.merge.bias

Although the training process continued and the trained model could be used for testing with the command: python tools/test.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py path/to/my/trained/pth --backbone dinov2_converted.pth As the evaIuation result appears pool, I suspect something is not correct. Could you advise on the correct approach for this situation? Or, is REIN suitable for binary classification segmentation tasks?

I would be greatful for your reply.

w1oves commented 5 months ago

The missing key for Rein is right. The size mismatch of patch embed from 1024-3-16-16 to 1024-1-16-16 is definitely error. This is because that dinov2 is pretrained on rgb image with 3 channels.

crystalline02 commented 5 months ago

The missing key for Rein is right. The size mismatch of patch embed from 1024-3-16-16 to 1024-1-16-16 is definitely error. This is because that dinov2 is pretrained on rgb image with 3 channels.

Thanks!Simply copying my 1 channel dataset to 3 channels worked well.