This PR ensures the padding mask is correctly constructed for both the un-prompted and prompted cases.
Un-prompted
Given input ids of:
<bos> a b c d e <eos>
The corresponding labels are the right-shifted ids and the decoder input ids the first N-1 ids:
labels: a b c d e <eos>
↑ ↑ ↑ ↑ ↑ ↑
input ids: <bos> a b c d e
Prompted
For prompted ids of format:
<prev> f g h i <bos> a b c d e <eos>
We should have:
labels: a b c d e <eos>
↑ ↑ ↑ ↑ ↑ ↑
input ids: <prev> f g h i <bos> a b c d e
=> the important aspect is that in the labels, we do not predict the <bos> token id, as was done prior to #77. The bug in #77 was that for un-prompted ids, we were also masking the first target label (a). This PR corrects this behaviour.
This PR ensures the padding mask is correctly constructed for both the un-prompted and prompted cases.
Un-prompted
Given input ids of:
The corresponding labels are the right-shifted ids and the decoder input ids the first N-1 ids:
Prompted
For prompted ids of format:
We should have:
=> the important aspect is that in the labels, we do not predict the
<bos>
token id, as was done prior to #77. The bug in #77 was that for un-prompted ids, we were also masking the first target label (a
). This PR corrects this behaviour.