state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.59k stars 1.06k forks source link

[Summary] Ability to pass in initial state #175

Open CompRhys opened 7 months ago

CompRhys commented 7 months ago

Several issues have requested the ability to input initial state but several of these have often been closed by those posting without the issue being resolved. This issue simply collates those prior issues to make comments by the maintainers in response to those more findable when searching open issues.

https://github.com/state-spaces/mamba/issues/155 https://github.com/state-spaces/mamba/issues/146 https://github.com/state-spaces/mamba/issues/141 https://github.com/state-spaces/mamba/issues/127 https://github.com/state-spaces/mamba/issues/101

tldr; this functionality is work in progress

CompRhys commented 6 months ago

https://github.com/state-spaces/mamba/issues/258

SamPruden commented 5 months ago

I'd also benefit a lot from this feature, as I have multiple training sequences with long common prefixes, and I'd like to be able to run the model over each prefix once, then fork the state for each continuation. This would be for use during training, so contrary to what @albertfgu said in #101 I would need gradient flow through the pause/resume process.

radarFudan commented 5 months ago

Actually, if you don't mind 10x slower and 2x gpu memory usage, there is a workaround for now: https://github.com/state-spaces/mamba/issues/51

But I guess the true mamba with initial hidden states will require CUDA master to improve it.