Closed zhouweii234 closed 3 years ago
Why the requires_grad of f_qr, f_kr, f_sve, f_sv is False? In this way, these parameters cannot be trained, but in your paper these parameters are learnable.
requires_grad
f_qr
f_kr
f_sve
f_sv
False
The gates are made learnable after some initial epochs . Please check #16
Why the
requires_grad
off_qr
,f_kr
,f_sve
,f_sv
isFalse
? In this way, these parameters cannot be trained, but in your paper these parameters are learnable.