Closed looper99 closed 6 months ago
Hi thanks for the question! It is not that difficult to make the parameters data-dependent. The diagonal parameters such as delta and/or A that are stored as vectors should be straightforward since the data-dependent function just needs vector-valued outputs. The dense matrix-valued B and C matrices will require a different parameterization to ensure efficiency so that you avoid requiring a function with matrix-valued outputs. But if you take care of this it should not be drastically slower.
Dear authors, thank you for your great work.
I was wondering, how hard would it be to make S5 model input-dependent like Mamba? On matrices B, C and delta, but also even on A?
If you do this with the current implementation, would it be drastically slower?