huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
9.05k stars 802 forks source link

How to use the XLMRoBERTa tokenizer? #689

Closed hapazv closed 3 years ago

hapazv commented 3 years ago

good day.

first sorry for my english.

code using 'bert-base-multilingual-cased'

to tokenize the phrases I did it like this and got a good result:

tokenizador bert

however when I wanted to do it with XLMRoBERTa I did not get good results, I would like to know how to do it with this other model, I appreciate the help.

token roberta

stefan-it commented 3 years ago

Hi @hapazv , could you check that you've installed sentencepiece library? If not please install it via pip3 install sentencepiece and check if it's working 🤗

hapazv commented 3 years ago

Hola @hapazv , ¿podrías comprobar que has instalado la sentencepiecebiblioteca? Si no es así, instálelo a través de pip3 install sentencepiecey compruebe si funciona🤗

@stefan-it Thank you very much, I have been a fool. it worked perfectly.