hustvl / TopFormer

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022
Other
373 stars 42 forks source link

Can you open source the ImageNet Pretraining code? #32

Open shiyutang opened 1 year ago

shiyutang commented 1 year ago

I have tried to pre-training the backbone+ppa+sase on imagenet, with convbnconv + linear as head. But I cannot reproduce the result of 75.3 on base model. So can you provide the source code of pertaining on imagenet?

speedinghzl commented 1 year ago

Please refer to https://github.com/fudan-zvg/SeaFormer for the ImageNet training script.

shiyutang commented 1 year ago

Thank you a lot. I will try it. Did you do anything in particular on imagenet pertaining? Since I cannot reproduce the imagenet top1 ACC. I manage to distill the model to achieve the ACC goal, but the training mIOU on ADE20k is still unsatisfactory.

speedinghzl commented 1 year ago

Maybe the training hyperparameters do matter. The training details are listed here, https://github.com/fudan-zvg/SeaFormer/tree/main/seaformer-cls#training.

shiyutang commented 1 year ago

Thank you very much :) The training process is undergoing. I have read the seaformer and thought it is very amazing. Good work!

shiyutang commented 1 year ago

After I train on seaformer-cls framework, the Imagenet top 1 acc goes from 72.6 to 74.9(Original implementation is 75.3). But the segmentation mIOU on ADE20K is almost the same as 36.68(It should be near 39.0), what could be the thing that I am missing?

speedinghzl commented 1 year ago

Thanks for the update. It seems that you have reproduced the classification accuracy. Did you meet NAN or something unnormal when training on Imagenet? After training on ImageNet, there is nothing special about the training segmentation model.

Do you use the official Topformer segmentation code or MMsegmentation? There is a small difference between them. Besides, Do you reproduce the 39.2 mIoU with the provided ImageNet pre-trained model?

shiyutang commented 1 year ago

I have reproduced the 38.3 mIOU(bs=16) with the provided model using the official topformer model. And there is nothing strange when training the imagenet.