Hello.I have been reading mamba-1 recently, and there is one issue that I do not quite understand。Based on the pseudo code in S6, the discretized A matrix has dimensions [B, L, D, N], and the hidden state h has dimensions [B, L, N]. I don't quite understand how these two matrices are multiplied. According to my understanding of the state-space equations, the A matrix should have dimensions [N, N]. How is the multiplication implemented here?There is a similar issue with the computation of the convolution kernel K. I don't quite understand how the dimensions are handled in the product of matrices C, A^k, and B.
Thank you very much and I look forward to your response.
the hidden state $h$ is also [B, L, D, N]. Because of state expansion, every channel of the original input of shape [B, L, D] gets blown up to N dimensions.
Hello.I have been reading mamba-1 recently, and there is one issue that I do not quite understand。Based on the pseudo code in S6, the discretized A matrix has dimensions [B, L, D, N], and the hidden state h has dimensions [B, L, N]. I don't quite understand how these two matrices are multiplied. According to my understanding of the state-space equations, the A matrix should have dimensions [N, N]. How is the multiplication implemented here?There is a similar issue with the computation of the convolution kernel K. I don't quite understand how the dimensions are handled in the product of matrices C, A^k, and B. Thank you very much and I look forward to your response.