Closed fionalluo closed 8 months ago
The interface is similar to any sequence-to-sequence block such as LSTM or Attention. I don't think any of those blocks would require sequence length before hand?
The interface is similar to any sequence-to-sequence block such as LSTM or Attention. I don't think any of those blocks would require sequence length before hand?
I see, thank you so much for the reply! In that case, what would be the correct way to use Mamba autoregressively, for example on next-frame prediction on a video? Will it train correctly if my training data contains image sequences of different lengths? I'm not sure which is the correct approach:
1) Train on image sequences of variable lengths. Each time, predict the entire sequence shifted forward one time step, and append the last prediction in the sequence to the current sequence for autoregressive generation 2) Pass in image sequences of fixed sequence length which includes padding and the actual sequence
Thanks again!
You can use it however you would use an Attention block. Easiest would be to pad (on the right) so that all sequences in the batch have equal length. "variable length" is technically possible but not implemented yet.
Got it, thank you so much! I was previously trying to use variable length sequences and was wondering why it wasn't training well
Closing this
When initializing a Mamba block, why is it that we don't need to pass in a sequence length input?
For example, I want to do next-step prediction by mapping timesteps [s ... t] to [s + 1 ... t + 1]. Does this mean that I can train mamba on any arbitrary sequence length, and it will accurately do next-step prediction for any arbitrary sequence length input? I'm wondering if I can do autoregressive prediction this way; or are we expected to pass in a constant sequence length.