Closed feivelliu closed 9 months ago
Hi, you could easily create a new ViT
backbone class in backbone.py.
Following are some tips:
window_size
and window_block_indexes
(here) options.get_swin_layer_id
function for Swin Tranformer. You could use this as a reference when adding an implementation for ViT.
(learning rate decay is a widely adopted trick when finetuning Mask-Image-Modeling pretrained models.)
Very happy to see your code! I am very interested in application to a plain ViT, can you provide some related tips? Thank you so much!