Closed sshleifer closed 4 years ago
We use padding, but it is not represented by a specific value but rather a mask.
The masking is generally able to take care that padding symbols do not get propagated. We do however fill the batch with the same id as the </s>
(end of sentence) symbol which is vocabulary dependent. </s>
does not receive gradients during training in its padding positions.
Makes sense, thanks!
Do
marian_decoder
ormarian_train
use padding to make batches on inputs a fixed shape? If so, which token/id is used to indicate padding?