microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 216 forks source link

Can't load DeBERTa-v3 tokenizer #70

Closed maiiabocharova closed 2 years ago

maiiabocharova commented 2 years ago
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")

Gives me an error ValueError: This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed in order to use this tokenizer. But sentencepiece is already installed

Also tried

!pip install deberta
from DeBERTa import deberta
vocab_path, vocab_type = deberta.load_vocab(pretrained_id='base-v3')
tokenizer = deberta.tokenizers[vocab_type](vocab_path)

this gives me TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Please help, how can I use the tokenizer for deberta-base-v3?

chrischowfy commented 2 years ago

from transformers import DebertaV2Tokenizer, DebertaV2Model tokenizer = DebertaV2Tokenizer.from_pretrained("microsoft/deberta-v3-base") is work for me.

maiiabocharova commented 2 years ago

from transformers import DebertaV2Tokenizer, DebertaV2Model tokenizer = DebertaV2Tokenizer.from_pretrained("microsoft/deberta-v3-base") is work for me.

Thank you, I was able to initialize tokenizer, but later it gives me an error when providing text to tokenizer tokenizer("Some text") TypeError: 'NoneType' object is not callable

chrischowfy commented 2 years ago

It's weird. Maybe the text you tokenized wasn't processed properly image

maiiabocharova commented 2 years ago

Hello, the issue was that I used colab and tokenizer needed sentencepiece to be installed. So the solution was to install sentencepiece and afterwards restart the runtime. (I didn't restart it at first)

Thank you sharing the model!