Bug in decoder_padding_mask in BPE training - Githubissues

k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall

Apache License 2.0

143 stars 42 forks source link

Bug in decoder_padding_mask in BPE training #242

Open csukuangfj opened 2 years ago

csukuangfj commented 2 years ago

See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721

You can see that ys_in_pad is padded with eos_id, which is a positive word piece ID.

However, it is using -1 to compute the mask for ys_in_pad.

This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.