Closed TheTinyTeddy closed 2 months ago
Hi @TheTinyTeddy
Thanks for your interest in our work. Your desired behavior needs to set up cache during decoding. To enable this, you need to set configuration = TTTConfig(use_cache=True)
. Please let me know if this resolve your issue.
Thank you for the reply!
So is there a similar notion of "KV cache" in TTT, just like in Transformer decoder?
Yes. But the cache size doesn't increase with sequence length. It's fixed size like any other RNN
Hi,
I was wondering in the code why is the sequence length increases as more tokens are predicted, since this is similar to how transformer decoder does inference.
My understanding of TTT is that it is a RNN with a per token weight update mechanism, but the sequence length should always be 1. Could you tell me if I am missing something?