hustvl / Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.55k stars 159 forks source link

Update models_mamba.py #90

Open chuc92man opened 1 month ago

chuc92man commented 1 month ago

While trying to train vision mamba with bidirectional mode in masked autoencoder network, I experienced nan loss. Though switch training from mixed precision to full precision fixed the problem but significantly increased training time (almost twice). Looking at the code, the adding of forward and backward hidden_states/residuals does increased the magnitude of both twice as compared to the original hidden states (after patch embedding). By dividing by 2, nan loss was resolved and mixed precision training can continue.