The inner loop involves parameters k,v and the model weights but only updates the model weights. The outer loop involves parameters q and model weights, only qkv needs to be updated. How is it updated to kv?
The "outer loop" updates QKV and the initialization of the "inner loop" model weights. We backprop through the updates to the inner model (gradients of gradients) to update the inner loop initialization.
The inner loop involves parameters k,v and the model weights but only updates the model weights. The outer loop involves parameters q and model weights, only qkv needs to be updated. How is it updated to kv?