Closed cifkao closed 13 hours ago
@ylacombe if you have a bit of time to give this a look, It'd be very appreciated.
Hey @cifkao, thanks for opening the issue! It's not an obvious issue to spot, so congrats on this and on providing a clear snippet to reproduce.
I've opened a PR to fix it, don't hesitate to comment directly on it if necessary!
TL;DR: Scores corresponding to the wrong sequence in the batch/beam are returned.
System Info
transformers
version: 4.43.2Who can help?
@sanchit-gandhi @ylacombe @patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
We can see that the scores returned by
generate()
are similar (though not identical) when the beam index is 0, but are much lower, and even-inf
, when the beam index is 1, suggesting that we are getting scores from the wrong sequence in the beam. (I guess a small difference in the scores in the vicinity of timestamps might be explained by the logits processor, but the score of the generated token should clearly never be-inf
.)The bug seems to be here in
_postprocess_outputs
. This works fine withnum_beams==1
, but withnum_beams>1
, the shape of the items inseek_outputs["scores"]
will be[num_beams * batch_size, vocab_size]
, while the code expects it to be[batch_size, vocab_size]
. Therefore, instead of choosing the correct sequence in the beam/batch, this code will incorrectly combine scores from different sequences.Expected behavior
The scores returned from
generate()
should be the same as in the forward pass.