correctly mask bos in prompted ids

This PR ensures the padding mask is correctly constructed for both the un-prompted and prompted cases.

Un-prompted

Given input ids of:

<bos>       a        b        c        d        e     <eos>

The corresponding labels are the right-shifted ids and the decoder input ids the first N-1 ids:

labels:          a        b        c        d        e     <eos>

                 ↑        ↑        ↑        ↑        ↑        ↑

input ids:    <bos>       a        b        c        d        e

Prompted

For prompted ids of format:

<prev>    f        g        h        i     <bos>    a        b        c        d        e     <eos>

We should have:

labels:                                                   a        b        c        d        e     <eos>

                                                          ↑        ↑        ↑        ↑        ↑        ↑

input ids:    <prev>    f        g        h        i     <bos>     a        b        c        d        e

=> the important aspect is that in the labels, we do not predict the <bos> token id, as was done prior to #77. The bug in #77 was that for un-prompted ids, we were also masking the first target label (a). This PR corrects this behaviour.

huggingface / distil-whisper

correctly mask bos in prompted ids #79

Un-prompted

Prompted