wusize / CLIPSelf

[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
https://arxiv.org/abs/2310.01403
Other
149 stars 8 forks source link

The model and loaded state dict do not match exactly #24

Closed WangZz777 closed 3 weeks ago

WangZz777 commented 3 weeks ago

Thank you so much for sharing But I'm in the evaluate phase and using files like your fvit_eva_vitb16_ovcoco_clipself_proposals.pth and my trained latest.pth will show:

The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.visual.rope.flag, backbone.visual.rope.freqs_cos_1600, backbone.visual.rope.freqs_sin_1600, backbone.visual.blocks.0.attn.rope.flag, backbone.visual.blocks.0.attn.rope.freqs_cos_1600, backbone.visual.blocks.0.attn.rope.freqs_sin_1600, backbone.visual.blocks.1.attn.rope.flag, backbone.visual.blocks.1.attn.rope.freqs_cos_1600, backbone.visual.blocks.1.attn.rope.freqs_sin_1600, backbone.visual.blocks.2.attn.rope.flag, backbone.visual.blocks.2.attn.rope.freqs_cos_1600, backbone.visual.blocks.2.attn.rope.freqs_sin_1600, backbone.visual.blocks.3.attn.rope.flag, backbone.visual.blocks.3.attn.rope.freqs_cos_1600, backbone.visual.blocks.3.attn.rope.freqs_sin_1600, backbone.visual.blocks.4.attn.rope.flag, backbone.visual.blocks.4.attn.rope.freqs_cos_1600, backbone.visual.blocks.4.attn.rope.freqs_sin_1600, backbone.visual.blocks.5.attn.rope.flag, backbone.visual.blocks.5.attn.rope.freqs_cos_1600, backbone.visual.blocks.5.attn.rope.freqs_sin_1600, backbone.visual.blocks.6.attn.rope.flag, backbone.visual.blocks.6.attn.rope.freqs_cos_1600, backbone.visual.blocks.6.attn.rope.freqs_sin_1600, backbone.visual.blocks.7.attn.rope.flag, backbone.visual.blocks.7.attn.rope.freqs_cos_1600, backbone.visual.blocks.7.attn.rope.freqs_sin_1600, backbone.visual.blocks.8.attn.rope.flag, backbone.visual.blocks.8.attn.rope.freqs_cos_1600, backbone.visual.blocks.8.attn.rope.freqs_sin_1600, backbone.visual.blocks.9.attn.rope.flag, backbone.visual.blocks.9.attn.rope.freqs_cos_1600, backbone.visual.blocks.9.attn.rope.freqs_sin_1600, backbone.visual.blocks.10.attn.rope.flag, backbone.visual.blocks.10.attn.rope.freqs_cos_1600, backbone.visual.blocks.10.attn.rope.freqs_sin_1600, backbone.visual.blocks.11.attn.rope.flag, backbone.visual.blocks.11.attn.rope.freqs_cos_1600, backbone.visual.blocks.11.attn.rope.freqs_sin_1600

What's the reason for this?