Closed PkuRainBow closed 3 years ago
Thanks for your interest in our work ! Now that the code is released, I should be able to release Seg-B/8 around next week along with the ViT-B/8 checkpoint converted from jax to torch, I will keep you posted.
Seg-B/8 would be great!
how about other kind of patch,like patch size 4,when can you release them
how about other kind of patch,like patch size 4,when can you release them
There is no available ViT model as far as I know.
We provide models pre-trained on ImageNet-21k for the following architectures: ViT-B/16, ViT-B/32, ViT-L/16 and ViT-L/32. We provide the same models pre-trained on ImageNet-21k and fine-tuned on ImageNet.
https://github.com/google-research/vision_transformer#available-vit-models
We release models with a Vision Transformer backbone initialized from the improved ViT models.
But i see the patch size 8 in the code,can u release them? @rstrudel
Hi, Sorry for the delay. We had quite some work both to update the checkpoints and the paper. I just released Seg-B/8 checkpoints finetuned from the ViT-B/8 backbone from Steiner et al.
@BruceYu-Bit as mentionned by @woctezuma , there is no ViT models with 4x4 patches. The smaller the patch, the longer the sequence passed to the transformer. As far as I know, with the vanilla transformer blocks things are just too costly to train on 4x4 patches. Maybe other cheaper form of attentions or MLP-Mixer like networks would be better suited to train models on such small patch size.
Hi @rstrudel,
I just released Seg-B/8 checkpoints finetuned from the ViT-B/8 backbone from Steiner et al.
Do you think that you can please share the weights of the ViT-B/8 with us? Thank you very much in advance.
Great work on semantic segmentation!
I find that the resolution is important for the final performance, e.g., Seg-B/8.
However, I could not find that ImageNet pre-trained checkpoints with patch-size 8 from the lib timm.
It would be great if you could help to address my concern!