hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.56k stars 160 forks source link

Training Recipe for Vim-B (base model) #13

Open mhamzaerol opened 5 months ago

mhamzaerol commented 5 months ago

Hello,

Thank you very much for the insightful work detailed in your paper!

The training recipe (e.g. learning rate, scheduler, #of epochs) and results for the Vim-T and Vim-S models were quite informative.

I am actually interested in adapting this approach for the base model. Could you provide guidance on any modifications needed in the training recipe for this purpose?

Additionally, if you might have results for the base model on ImageNet 1K, I would greatly appreciate if you could share them. This would allow me to benchmark my training outcomes against yours for a comprehensive comparison.

Thank you once again for your valuable contributions!

xiaomengxin123 commented 4 months ago

+1

jsrdcht commented 1 month ago

I‘m interested also