state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
11.94k stars 998 forks source link

Mamba in Long range arena (LRA) #282

Open Antidotec opened 3 months ago

Antidotec commented 3 months ago

May I ask if anyone has tested the effect of mamba in LRA task? I tried to replace the network structure of s4 with the Mixermodel provided by this repository, but the effect was not good on some tasks. Do you have any suggestions for this?

pragyasrivastava0805 commented 2 months ago

@Antidotec, can you tell me how to construct the inference pipeline for LRA on MAMBA models?

ngdxzy commented 3 weeks ago

I am doing the same experiment on Pathfinder, and I also find that the model doesn't train...

msjun23 commented 2 weeks ago

I am doing exactly the same work. I replaced the S4 block in the S4 structure with Mamba's selective SSM. Currently, I am getting very low results on LRA-ListOps and Text, and the interim results for Retrieval (training now) do not seem good.

albertfgu commented 2 weeks ago

We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn

With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.

Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks

ngdxzy commented 2 weeks ago

We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn

With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.

Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks

Thanks for replying! That is exactly what I observed! I also find that Mamba trains slower on Copying dataset than s4d while s4d fails on Selective Copying.