raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Gradient accumulation code implement #43

Open King4819 opened 7 months ago

King4819 commented 7 months ago

Can I ask how do you implement gradient accumulation code in deit model training? Since I can not find other resources on the internet doing gradient accumulation on deit training, but I am interested in doing this in order to training deit from scratch. Thanks!