yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.14k stars 177 forks source link

Hard to train #40

Closed huixiancheng closed 3 years ago

huixiancheng commented 3 years ago

Hi.Dear @yuanli2333 I try to use t2t-vit for downstream sem.seg tasks. However ,as we know Vit backbone it's very hard to train. The default settings of train epochs in ImageNet is 300. I have try two different network structure with t2t-vit 14. The 1st train with SGD optimizer and cosine-warmup.After 120 epochs, the loss curves as follow QQ截图20210327144126 The 2nd train with Adam optimizer and cosine-warmup.(not use timm.create_optimizer to set adamw sice i need to set different lr for different blocks.) The set of lr is similar to your setting.After 40 epochs, the loss curves also as follow. QQ截图20210327143246 It's look like that the 2nd training much better and the loss is still in decrease.But I'm not sure is it on the right path.(according to my calculation, it will take 6 days to train 300 epochs with a single 3090 GPU, so I don't have time to trial & error:sob::sob::sob:) Could you show me your training log as a reference or give me some advice? Thank you very much.

huixiancheng commented 3 years ago

Is it must to use adamw as optimizer?If it is, I will use it.

yuanli2333 commented 3 years ago

Hi, I have uploaded the log of T2T-ViT-14 in here. You can compare it with your training.

I think your loss curve is normal. Empirically, we use Adam or AdamW for visual transformer, SGD can work but seems not better than Adam/W.

huixiancheng commented 3 years ago

Thank you very much!:bow: