Possible Inconsistency(Possibly a Typo) Regarding RNN Inefficiency and Vanishing Gradients Problem in `Section B.3` of the Mamba Paper.

rustic-snob commented 8 months ago

Dear Authors,

I hope this message finds you well. I am writing to discuss a segment in Section B.3 of your paper, which addresses the challenges associated with older RNNs, specifically regarding efficiency issues and the vanishing gradients problem. These challenges are attributed to their sequential nature. The passage states:

"Additionally, older RNNs famously suffered from efficiency issues and the vanishing gradients problem (Pascanu, Mikolov, and Bengio 2013), both caused by their sequential nature. The latter could be solved for some of the above RNNs by leveraging the parallel scan (Martin and Cundy 2018), but the former was difficult without theory later developed for SSMs. For example, modern structured SSMs differ in more careful parameterization of the recurrent dynamics inspired by classical SSM theory (e.g., through discretization (Gu, Johnson, Goel, et al. 2021; Gu, Johnson, Timalsina, et al. 2023)), or direct analysis (Orvieto et al. 2023))."

Considering the contributions of (Martin and Cundy 2018) and the subsequent discussion, it appears there might be a need to swap the latter and the former. This change would accurately reflect that inefficiencies can be somewhat mitigated through parallel scans, whereas the vanishing gradients problem represents a more complex challenge that is harder to address.

Although this might seem trivial, I believe addressing this could enhance the clarity and accuracy of the discussion on this critical topic in Section B.3. Your work is greatly appreciated within the community, and ensuring readers have a clear understanding of these contributions and solutions is essential.

Thank you for your time and consideration. I eagerly await your response.

Best regards,

Jaewon Cheon

albertfgu commented 8 months ago

Yes, you're completely right. Thanks for pointing this out! It has been fixed locally and will appear in the next upload of the paper.

rustic-snob commented 8 months ago

Thank you for your receptive response!

state-spaces / mamba

Possible Inconsistency(Possibly a Typo) Regarding RNN Inefficiency and Vanishing Gradients Problem in `Section B.3` of the Mamba Paper. #206