if my input x is [10,3,224,224], what should extra_tokens["channels"] be?

insitro / ChannelViT

Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words

https://arxiv.org/abs/2309.16108

Other

47 stars 6 forks source link

if my input x is [10,3,224,224], what should extra_tokens["channels"] be? #3

Closed daixiangzi closed 4 months ago

daixiangzi commented 9 months ago

when I use hcs_channel_vit.py ,

x = torch.randn(10,3,224,224)
patch_embed = PatchEmbedPerChannel(
            img_size=224,
            patch_size=32,
            in_chans=3,
            embed_dim=387,
            enable_sample=False,
        )
   y  = patch_embed(x)

daixiangzi commented 9 months ago

https://github.com/insitro/ChannelViT/blob/c286fa264e95a992183dc721f4185c60ca8f8f08/channelvit/backbone/hcs_channel_vit.py#L51 self.chanel_embed dim should be 3,but your code dim is 2? my set : cur_channel_embed = self.channel_embed(torch.tensor([0,1,2])) error:RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3

priyarana commented 5 months ago

Hi, I am getting the same error. I want to use pretrained camelyon_channelvit_small_p8_with_hcs_supervised for images with 7 channels. Did you solve this issue?

srinivasans-insitro commented 5 months ago

@daixiangzi extra_tokens["channels"] should contain channel indices per batch and should be of shape batch_size x n_channels.

For example, in the ImageNet dataset, we return a dictionary containing channels per sample which is collated using pytorch default_collate function. default_collate collates Mapping[K, V_i] -> Mapping[K, default_collate([V_1, V_2, …])] resulting in extra_tokens['channels'] of shape batch_size x n_channels.

srinivasans-insitro commented 5 months ago

Hi, I am getting the same error. I want to use pretrained camelyon_channelvit_small_p8_with_hcs_supervised for images with 7 channels. Did you solve this issue?

@priyarana The camelyon_channelvit_small_p8_with_hcs_supervised model was trained on 3 channel inputs and will only work if the number of channels are <= 3. Is there any specific structure to the 7 channels that motivated you to use the model trained on 3-channel Camelyon dataset?

priyarana commented 5 months ago

Thanks Sri!, My images have 7 channels, and I am thinking to use pretrained "camelyon_channelvit_small_p8_with_hcs_supervised" for feature extraction. to begin with I am trying this library for 3 channels as of now, and getting this error : /torch/hub/insitro_ChannelViT_main/channelvit/backbone/hcs_channel_vit.py", line 63, in forward cur_channel_embed = self.channel_embed(extra_tokens["channels"]) # B, Cin, embed_dim=Cout


KeyError: 'channels'

Just wondering if you have any inputs on this?

srinivasans-insitro commented 5 months ago

The error message KeyError: 'channels' indicates that you're not returning channels in your dataset alongside the input image. Please refer to the ImageNet dataset for example.