whitebox-research / excursions

2 stars 1 forks source link

What is the effect of the choice of padding strategy on transformer performance? #6

Open kgreyy opened 3 months ago

kgreyy commented 3 months ago
zrkrlc commented 3 months ago

By 'attention mask', you mean this, right? https://lukesalamone.github.io/posts/what-are-attention-masks/