state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.7k stars 1.06k forks source link

Question about d_state. #531

Open CacatuaAlan opened 1 month ago

CacatuaAlan commented 1 month ago

I have a 4-stage network, and considering that each stage has a different number of tokens, I want to set different sizes for d_state, e.g., [256, 128, 64, 32]. However, I noticed that the training time has significantly slowed down. Is this a normal phenomenon? Does changing d_state in this way make sense?

Aristo23333 commented 1 month ago

Hi! I also find that change will obviously influence the training speed. This paper gives a theoretical discussion about this maybe you could refer to. (https://arxiv.org/abs/2407.07279). But what I really concern is that according to this perspective, although the training cost may growth, the model will converge faster. And in my openion, that means less epoch but better test accuracy? But finally I cannot get a very significantly better performamce.....