Open Mushtaqml opened 2 years ago
@Mushtaqml yea, you simply have to change the way you encode the patches to be 3d
change https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L92 to
# batch channel height, width, depth
Rearrange('b c (h p1) (w p2) (d p3) -> b (c p1 p2 p3) h w d')
you'll have to calculate the appropriate input dimensions, which will be 3 * p1 * p2 * p3
, and then also set the appropriate length for the absolute positional encoding
@lucidrains Hello, thanks for your help, could you help me upload my CT scan images into the Dinov2
the input for the model is in the following format
py torch.Size([16,1,10,18,18]) # Batch, Channel, Depth , Height , Width
how can i rearnge it !
HI!
I would like to thank you first for such a good and updated repo regarding Vision Transformers.
I want to know if I can use 3d medical images to pretrain the ViT using 3D medical images?. Do I need to make some changes to the sample code you shared.
Thanks