Open JiuyangDong opened 2 weeks ago
Is this due to the randomness that Mamba brings to the table itself?
Sry for late reply.
For the instability of Mamba, you can refer to https://github.com/state-spaces/mamba/issues/137. You can use a smaller learning rate, which may allow you to reach the same local optimum before the backpropagation divergence occurs when you run the code multiple times.