Closed YongLD closed 7 hours ago
Is there any related code for the TTT layer in the Mamba backbone? If not, what should be considered when mapping the ABC in Mamba to QKV in TTT?
You can take a look at our JAX codebase to see how our Mamba backbone is implemented. https://github.com/test-time-training/ttt-lm-jax
Is there any related code for the TTT layer in the Mamba backbone? If not, what should be considered when mapping the ABC in Mamba to QKV in TTT?