Is there a way to use this model with BPE tokenization (like XLM-RoBERTa) - Githubissues

nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

MIT License

1.29k stars 465 forks source link

Is there a way to use this model with BPE tokenization (like XLM-RoBERTa) #196

Open Zhylkaaa opened 4 years ago

Zhylkaaa commented 4 years ago

Is there any plans to support RoBERTa like models? Or at least can someone provide some hints on how can I adapt this code for this propose

I see the problem in preprocessing data, because there are some difficulties with [unusedxxx] tokens in tokenizer and a bit different text preprocessing for RoBERTa tokenizer