Open stanleyshly opened 5 months ago
There's no encoder-decoder afaik, you can try concatenation.
That is unfortunate. I'm trying to do text to speech generation, so using concatenation would be unideal since the speech tokens would be quite long. Would this be an issue?
I was thinking about using Mamba as a feature extractor and just using the feature embeddings in another model, but this still seems unideal.
Has their been any work done on non-autoregressive tasks with Mamba?
There are some work on "bidirectional" Mamba, you can search.
I see. Seems like bidirectional Mamba has mostly been applied to images.
Is it possible to extract the embeddings and tokenize them, using Mamba as a feature extractor?
I'm looking to accomplish some sort of Vector to Vector task with Mamba. Does any encoder-decoder architecture exist, or any alternative approaches to this task using Mamba? Or is having the output concatenated onto the input the only option?