Closed lukasfrank closed 4 years ago
position_ids
are inferred by model.forward
if position_ids=None
(the default) is passed.-1
are ignored (masked)".@lukasfrank For your first question:
All the attention in transfertransfo is masked self attention, that means all the future tokens are masked. Therefore all the padding tokens are masked when applying the future mask, and these padding tokens will not be involved in the attention calculation process.
I'm studying the model and was wondering about a few things: