Closed xinsu626 closed 3 years ago
Because shifting labels will cause the shifting depending on the number of tokens in the input. And this will make the index of special tag
Because shifting labels will cause the shifting depending on the number of tokens in the input. And this will make the index of special tag vary in each input, which will cause the beam search algorithm hard to determine the finish code.
Hi @yhcc , got it. Thanks for your reply! Sorry I have a follow-up question. Is it because you put the EOS token in the second position of the label space ([BOS, EOS, Tag1, ...]
), so you set the EOS token id to 1 instead of BART's original 2 during generation (inference phase)?
Yes. We map the eos id 1 to 2 in the forward function of our model (so that BART can still get proper eos token id).
Yes. We map the eos id 1 to 2 in the forward function of our model (so that BART can still get proper eos token id).
@yhcc Thank you! This is really helpful.
Hello, nice work! Sorry if I miss something. I have a question about the decoder's output in your code.
Based on your code it seems you're shifting position indexes of the tokens by a number of labels. I was wondering why shift tokens instead of shift the labels. Thank you!
Please find a example below (results from here).