state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
11.42k stars 928 forks source link

deriving embeddings #13

Open ekg opened 6 months ago

ekg commented 6 months ago

What would be the best way to derive embeddings from mamba models? Is there a straightforward approach or would we need a new architecture?

albertfgu commented 6 months ago

I'm not sure which form of deriving embeddings you're thinking of. For most ways that you could apply a Transformer, you could also apply Mamba. One exception may be if you need an explicit bidirectional instead of causal model e.g. for MLM (BERT-style) pretraining. We're working on the proper way to do this but you could always just concatenate or add two copies of Mamba (one running in reverse direction), just like how this used to be handled with RNNs.

ddofer commented 5 months ago

A bidirectional (Bert/MLM or Electra/RTD) pretraining setup model with Mamba would be amazing!

albertfgu commented 5 months ago

It's coming soon!