johnma2006 / mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Apache License 2.0
2.54k stars 188 forks source link

Discretization of `B` #2

Closed jeromeku closed 9 months ago

jeromeku commented 9 months ago

Thanks for the clear implementation!

Can you explain the discretization of $B$ in selective scan?

Equation 4 in section 2 of the paper states $$\overline{B} = (\Delta A)^{-1} (exp((\Delta A) - I) \cdot \Delta B$$

In your implementation, the input is mapped into the hidden state by the following:

 deltaB_u = einsum(delta, B, u, 'b l d_in, b l n, b l d_in -> b d_in l n')

which if I understand correctly, implies that $\overline{B} = \Delta B$?

johnma2006 commented 9 months ago

Hi, I added a comment here: https://github.com/johnma2006/mamba-minimal/blob/master/model.py#L307 B uses a simplified Euler discretization instead of ZOH, which the authors say: "performance doesn't change much with the simplication on B" (from a discussion I had with Albert)

jeromeku commented 9 months ago

@johnma2006 Thanks for the clarification!