Closed manmay-nakhashi closed 1 year ago
as written in paper: 10 Q-K-V attention layers for in-context learning, which have 512 hidden dimensions and 8 attention heads and are placed every 3 1D convolution layers.
closed as this is already done.
@manmay-nakhashi oh oop, didn't know you were working on this! thank you regardless
as written in paper: 10 Q-K-V attention layers for in-context learning, which have 512 hidden dimensions and 8 attention heads and are placed every 3 1D convolution layers.