Why is the implementation of Mamba so slow?

johnma2006 / mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Apache License 2.0

2.54k stars 188 forks source link

Why is the implementation of Mamba so slow? #21

Open baiSongL opened 6 months ago

baiSongL commented 6 months ago

I haven't run the official version of Mamba, but I've run your implementation, and it seems that the training speed of this model is much slower than that of the Transformer.

aryanmangal769 commented 6 months ago

I am also facing the similar issue. Are you able to find the reason for it?

lingxitong commented 6 months ago

that is mainly because there is something wrong which lead the loss.backward so slow

johnma2006 commented 5 months ago

It's not meant to be fast! This repo is mostly only for educational purposes. I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba

D-Walter commented 5 months ago

probably because mamba-ssm designed a GPU-adapted scanning operation based on C.

mengdeyu539 commented 2 days ago

I think the problem is embeding layer. D-layer is 50280, lol.