Open simple-boy opened 1 month ago
I have also tried to implement an encoder-decoder structure using Mamba. It (at least) performs better than LSTM; but not than the Transformers. According to https://github.com/state-spaces/mamba/issues/78, it's still an open question.
how about check this one https://goombalab.github.io/blog/2024/hydra-part2-model/
I am trying to combine mamba and attention to do image captioning task, but adding it in the decoder will always lead to overfitting behavior. Are there any examples of Mamba used in decoder for reference?