Open baiSongL opened 6 months ago
I am also facing the similar issue. Are you able to find the reason for it?
that is mainly because there is something wrong which lead the loss.backward so slow
It's not meant to be fast! This repo is mostly only for educational purposes. I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba
probably because mamba-ssm designed a GPU-adapted scanning operation based on C.
I think the problem is embeding layer. D-layer is 50280, lol.
I haven't run the official version of Mamba, but I've run your implementation, and it seems that the training speed of this model is much slower than that of the Transformer.