state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.54k stars 1.05k forks source link

Mamba for encoder-decoder #515

Open simple-boy opened 1 month ago

simple-boy commented 1 month ago

I am trying to combine mamba and attention to do image captioning task, but adding it in the decoder will always lead to overfitting behavior. Are there any examples of Mamba used in decoder for reference?

tdh8316 commented 1 month ago

I have also tried to implement an encoder-decoder structure using Mamba. It (at least) performs better than LSTM; but not than the Transformers. According to https://github.com/state-spaces/mamba/issues/78, it's still an open question.

dumpmemory commented 1 month ago

how about check this one https://goombalab.github.io/blog/2024/hydra-part2-model/