Open Antidotec opened 3 months ago
@Antidotec, can you tell me how to construct the inference pipeline for LRA on MAMBA models?
I am doing the same experiment on Pathfinder, and I also find that the model doesn't train...
I am doing exactly the same work. I replaced the S4 block in the S4 structure with Mamba's selective SSM. Currently, I am getting very low results on LRA-ListOps and Text, and the interim results for Retrieval (training now) do not seem good.
We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn
With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.
Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks
We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn
With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.
Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks
Thanks for replying! That is exactly what I observed! I also find that Mamba trains slower on Copying dataset than s4d while s4d fails on Selective Copying.
May I ask if anyone has tested the effect of mamba in LRA task? I tried to replace the network structure of s4 with the Mixermodel provided by this repository, but the effect was not good on some tasks. Do you have any suggestions for this?