lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
MIT License
19.91k stars 2.99k forks source link

How to get the feature map of the vit encoder #191

Open bladetin opened 2 years ago

bladetin commented 2 years ago

I'm working on image fusion and want to combine CNN and vit, I want to get the feature map by the vit encoder and then do further processing, so I want to ask you how to get the feature map output by the vit encoder.Looking forward to your reply, thank you!

XuweiC commented 2 years ago

Try to use the function hook with (register_forward_hook) to get the specific feature map.

bladetin commented 2 years ago

Try to use the function hook with (register_forward_hook) to get the specific feature map.

Thank you!

hlr7999 commented 2 years ago
img = self.vit.to_patch_embedding(img)
img += self.vit.pos_embedding[:, :img.shape[1]]
img = self.vit.dropout(img)
img = self.vit.transformer(img)
bladetin commented 2 years ago
img = self.vit.to_patch_embedding(img)
img += self.vit.pos_embedding[:, :img.shape[1]]
img = self.vit.dropout(img)
img = self.vit.transformer(img)

谢谢!