Open csukuangfj opened 2 years ago
See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721
You can see that ys_in_pad is padded with eos_id, which is a positive word piece ID.
ys_in_pad
However, it is using -1 to compute the mask for ys_in_pad.
This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.
See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721
You can see that
ys_in_pad
is padded with eos_id, which is a positive word piece ID.However, it is using -1 to compute the mask for
ys_in_pad
.This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.