In particular, the following changes should be of note:
Every nn.Linear instantiation is now replaced by the get_linear function, which returns an nn.Linear with xavier init. This also affects how the E and F matrices were initialized.
There are no more w_q, w_k, and w_v matrices in the LinearAttentionHead module. Instead, in the MHAttention module, to_{q,k,v} is now a ModuleList, and there are nheadnn.Linear layers in each of them, each corresponding to the original weight matrix in the original paper.
Fixed a bug where there were still **kwargs in the checkpoint function "C2".
Changed some things in the code as mentioned here: https://github.com/tatp22/linformer-pytorch/issues/6
In particular, the following changes should be of note:
Every
nn.Linear
instantiation is now replaced by theget_linear
function, which returns annn.Linear
with xavier init. This also affects how theE
andF
matrices were initialized.There are no more
w_q
,w_k
, andw_v
matrices in theLinearAttentionHead
module. Instead, in theMHAttention
module,to_{q,k,v}
is now aModuleList
, and there arenhead
nn.Linear
layers in each of them, each corresponding to the original weight matrix in the original paper.Fixed a bug where there were still
**kwargs
in the checkpoint function"C2"
.