Open fatemeh-sh264 opened 4 years ago
Why did you use Roberta and not use BERT or ELMO instead?
In an ablation study (that we didn't publish) we found that RoBERTa fine-tunes better than BERT or GPT-2 itself. We expect ELECTRA should work as well.
Why did you use Roberta and not use BERT or ELMO instead?