A concise but complete full-attention transformer with a set of promising experimental features from various papers
4.63k
stars
395
forks
source link
bug fixed in the forward method of LearnedAlibiPositionalBias class #147
Closed
taemincho closed 1 year ago
Deleted variables that were not previously removed