rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation
MIT License
842 stars 174 forks source link

Ask about the "Seg-B/8" #3

Closed PkuRainBow closed 3 years ago

PkuRainBow commented 3 years ago

Great work on semantic segmentation!

I find that the resolution is important for the final performance, e.g., Seg-B/8.

However, I could not find that ImageNet pre-trained checkpoints with patch-size 8 from the lib timm.

It would be great if you could help to address my concern!

rstrudel commented 3 years ago

Thanks for your interest in our work ! Now that the code is released, I should be able to release Seg-B/8 around next week along with the ViT-B/8 checkpoint converted from jax to torch, I will keep you posted.

PavelGrigorev commented 3 years ago

Seg-B/8 would be great!

BruceYu-Bit commented 3 years ago

how about other kind of patch,like patch size 4,when can you release them

woctezuma commented 3 years ago

how about other kind of patch,like patch size 4,when can you release them

There is no available ViT model as far as I know.

We provide models pre-trained on ImageNet-21k for the following architectures: ViT-B/16, ViT-B/32, ViT-L/16 and ViT-L/32. We provide the same models pre-trained on ImageNet-21k and fine-tuned on ImageNet.

https://github.com/google-research/vision_transformer#available-vit-models

We release models with a Vision Transformer backbone initialized from the improved ViT models.

https://github.com/rstrudel/segmenter#model-zoo

BruceYu-Bit commented 3 years ago

But i see the patch size 8 in the code,can u release them? @rstrudel

rstrudel commented 3 years ago

Hi, Sorry for the delay. We had quite some work both to update the checkpoints and the paper. I just released Seg-B/8 checkpoints finetuned from the ViT-B/8 backbone from Steiner et al.

rstrudel commented 3 years ago

@BruceYu-Bit as mentionned by @woctezuma , there is no ViT models with 4x4 patches. The smaller the patch, the longer the sequence passed to the transformer. As far as I know, with the vanilla transformer blocks things are just too costly to train on 4x4 patches. Maybe other cheaper form of attentions or MLP-Mixer like networks would be better suited to train models on such small patch size.

vobecant commented 3 years ago

Hi @rstrudel,

I just released Seg-B/8 checkpoints finetuned from the ViT-B/8 backbone from Steiner et al.

Do you think that you can please share the weights of the ViT-B/8 with us? Thank you very much in advance.