Closed suttergustavo closed 1 year ago
This is just a setup in which the model was trained.
I see, but training from scratch is there a difference between using it or not? My question is basically what to do if I want to try a new Transformer backbone, should I use it or not?
Yes, I would rather train it using special_tokens_fix
to avoid splitting it into the subwords.
Makes sense, thanks very much for the answers!
I understand that the special_tokens_fix is adding the $START token to the vocabulary, but can someone explain why we only do that for the RoBERTa model?