I hope this message finds you well. I am writing to discuss a segment in Section B.3 of your paper, which addresses the challenges associated with older RNNs, specifically regarding efficiency issues and the vanishing gradients problem. These challenges are attributed to their sequential nature. The passage states:
"Additionally, older RNNs famously suffered from efficiency issues and the vanishing gradients problem (Pascanu, Mikolov, and Bengio 2013), both caused by their sequential nature. The latter could be solved for some of the above RNNs by leveraging the parallel scan (Martin and Cundy 2018), but the former was difficult without theory later developed for SSMs. For example, modern structured SSMs differ in more careful parameterization of the recurrent dynamics inspired by classical SSM theory (e.g., through discretization (Gu, Johnson, Goel, et al. 2021; Gu, Johnson, Timalsina, et al. 2023)), or direct analysis (Orvieto et al. 2023))."
Considering the contributions of (Martin and Cundy 2018) and the subsequent discussion, it appears there might be a need to swap the latter and the former. This change would accurately reflect that inefficiencies can be somewhat mitigated through parallel scans, whereas the vanishing gradients problem represents a more complex challenge that is harder to address.
Although this might seem trivial, I believe addressing this could enhance the clarity and accuracy of the discussion on this critical topic in Section B.3. Your work is greatly appreciated within the community, and ensuring readers have a clear understanding of these contributions and solutions is essential.
Thank you for your time and consideration. I eagerly await your response.
Dear Authors,
I hope this message finds you well. I am writing to discuss a segment in Section B.3 of your paper, which addresses the challenges associated with older RNNs, specifically regarding efficiency issues and the vanishing gradients problem. These challenges are attributed to their sequential nature. The passage states:
"Additionally, older RNNs famously suffered from
efficiency issues
and thevanishing gradients problem
(Pascanu, Mikolov, and Bengio 2013), both caused by their sequential nature.The latter
could be solved for some of the above RNNs by leveraging the parallel scan (Martin and Cundy 2018), butthe former
was difficult without theory later developed for SSMs. For example, modern structured SSMs differ in more careful parameterization of the recurrent dynamics inspired by classical SSM theory (e.g., through discretization (Gu, Johnson, Goel, et al. 2021; Gu, Johnson, Timalsina, et al. 2023)), or direct analysis (Orvieto et al. 2023))."Considering the contributions of (Martin and Cundy 2018) and the subsequent discussion, it appears there might be a need to swap the latter and the former. This change would accurately reflect that inefficiencies can be somewhat mitigated through parallel scans, whereas the vanishing gradients problem represents a more complex challenge that is harder to address.
Although this might seem trivial, I believe addressing this could enhance the clarity and accuracy of the discussion on this critical topic in Section B.3. Your work is greatly appreciated within the community, and ensuring readers have a clear understanding of these contributions and solutions is essential.
Thank you for your time and consideration. I eagerly await your response.
Best regards,
Jaewon Cheon