test-time-training / ttt-lm-pytorch

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
MIT License
1.01k stars 56 forks source link

12/5000 Confusion about inner and outer loops #8

Closed zsszyx closed 3 months ago

zsszyx commented 3 months ago

The inner loop involves parameters k,v and the model weights but only updates the model weights. The outer loop involves parameters q and model weights, only qkv needs to be updated. How is it updated to kv?

karan-dalal commented 3 months ago

The "outer loop" updates QKV and the initialization of the "inner loop" model weights. We backprop through the updates to the inner model (gradients of gradients) to update the inner loop initialization.