Closed lucasmgomez closed 5 months ago
Sure! Take a look at the Mamba diagram in Fig 3 of the Mamba paper. The first thing the input x
does is split into two branches and go through two linear projections. in_proj
is simply a way to compute both those linear projections at the same time in one matmul. They are split apart later: https://github.com/johnma2006/mamba-minimal/blob/master/model.py#L223 (x
is the left branch, res
is the right branch)
Thanks that makes sense!
I understand that d_inner is
d_model * expansion
(E=2) . But why isself.in_proj = nn.Linear(args.d_model, args.d_inner * 2 ...)
.Why is the in projection expanded a second time by 2 ?
I can't seem to find the answer in the appointed paper section 3.4.
Any clarification would be appreciated.