Closed Chuxwa closed 5 months ago
Glad to hear that you like our work. Yes, they are not computed along the L axis
, but rather on the C axis
. To help you understand better, as illustrated in Figure 3, L+SSM
is equivalent to C+SSM
, and together with C-SSM
, they are computed simultaneously along the C axis
. Then, following Mamba, they are scanned in the order of L+
. We use L+
as a means to differentiate from token flipping (L-
), which compels the model to learn information about the forward and reverse order of tokens. We believe that such information possesses pseudo-order dependency. Does this answer your question? Sorry if the paper causes any misunderstanding.
Thanks for your reply. In BiSSM, both L+SSM and C-SSM are scanned in the order of L+. This solved my problem.
Hi, I read your paper and like your work. But I'm not sure if L+SSM and C-SSM are computed along the token dimension (L axis)?