raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Implementation details are so largely different from the paper description #23

Closed ming1993li closed 2 years ago

raoyongming commented 2 years ago

Hi @ming1993li,

Thanks for your interest in our work. Could you please point out which part is different?

ming1993li commented 2 years ago

Hi, for example, the paper says the learning rate is 1e-3/1024 * batch_size. I can not find the implementation in the "main.py".

raoyongming commented 2 years ago

@ming1993li The learning rate is manually adjusted via the command line. For example, our ConvNeXt-T model is trained using 4x 8-GPU nodes with a batch size per GPU of 128, so we set the learning rate to 4e-3. If the model is trained on a single node, we adjust the update_frequency to 4 such that the effective batch size is the same as 4 nodes. For the DeiT-S model that is trained on a single node with a global batch size of 1024, we set the learning rate to 1e-3 in the command line. These commands can all be found in the README.md.

ming1993li commented 2 years ago

Thanks