How much does ImageNet pre-training affect model performance?

rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

MIT License

846 stars 175 forks source link

Hi,

I am trying to use the baseline model (Linear decoder) described in the paper as a baseline for some of my work. However, I do not have access to pre-trained ImageNet weights. My model is not able to learn, converging at around 0.25 mDICE on the training set of Cityscapes. This is after hyper parameter optimisation across SGD, Adam + different learning rate schedulers.

I was wondering if during your experiments, you saw similar levels of performance when you did not initialise your transformer backbones with pre-trained weights? Was this tested for the baseline (ViT + Linear) and your proposed method (ViT + Mask)?

Thank you

rstrudel / segmenter

How much does ImageNet pre-training affect model performance? #53