Correspondence between Mamba and Transformer layer in PyTorch

state-spaces / mamba

Mamba SSM architecture

Apache License 2.0

13.2k stars 1.12k forks source link

Correspondence between Mamba and Transformer layer in PyTorch #309

Open edofazza opened 6 months ago

edofazza commented 6 months ago

I want modify an architecture, which passes a tensor x of size (8, 16, 512) (the first value is the batch) and a query_embed parameter of size (8, 140, 512) through a torch.nn.Transformer layer with d_model set to 512 and all the other parameters equals those indicated in the paper "Attention is all you need", using Mamba layers. How can I build this correspondence between the Transformer layer and an architecture based on Mamba? Thank you.

nikoskot commented 2 weeks ago

I am also interested on the answer of this question. Let's say we have a model with a Transformer-based backbone and we want to integrate the Mamba module to it. Do we use Mamba instead of the whole Transformer layer, or we swap only the Attention layer?