purecloudlabs / roberta-tokenizer

MIT License
17 stars 1 forks source link

FacebookAI/xlm-roberta-large-finetuned-conll03-english #7

Open RaoufiTech opened 3 months ago

RaoufiTech commented 3 months ago

Hey Does this work with FacebookAI/xlm-roberta-large-finetuned-conll03-english too? And where can I find base_vocabulary.json?

yaireclipse commented 3 months ago

Hi @AiTester950, Sorry, I think it doesn't. FacebookAI/xlm-roberta-large-finetuned-conll03-english is a finetuned XLM-RoBERTa, which means it uses XLMRobertaTokenizer, which uses SentencePiece. RobertaTokenizer uses byte-level Byte-Pair-Encoding, which is what's implemented in this repo.