Larger models learn slowly?

yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Other

1.15k stars 176 forks source link

Larger models learn slowly? #50

Closed chenwydj closed 3 years ago

chenwydj commented 3 years ago

Dear authors,

Thank you very much for this great repo!

I am training larger models (T2T-ViT-19/24, etc.), and I find during training their accuracies increase slower than small models like T2T-ViT-7. Is this an expected behavior?

Thank you!

yuanli2333 commented 3 years ago

Hi, large model would converge slower at first 10 to 20 epochs, but will increase faster after the initial stage.

chenwydj commented 3 years ago

Thank you!