wietsedv / bertje

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
https://aclanthology.org/2020.findings-emnlp.389/
Apache License 2.0
135 stars 10 forks source link

OS error when using 'bert-base-dutch-cased' #10

Closed LiesjevdLinden closed 4 years ago

LiesjevdLinden commented 4 years ago

Hi! I'm using the model for my thesis and where it used to work when opening it from transformers with the suggested code, it now gives the following error:

OSError: Model name 'bert-base-dutch-cased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). We assumed 'bert-base-dutch-cased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

It does work when I use 'wietsedv/bert-base-dutch-cased' which is in the model name list according to the error. Possibly the name has changed in transformers?

wietsedv commented 4 years ago

Thanks for pointing this out! Appearantly Hugging Face has added the user/organisation prefixes to the shortcuts, so you should indeed now use wietsedv/bert-base-dutch-cased. Ultimately, I agree with this choice by Hugging Face.

I updated my readme, so I'm closing this issue.