Didn't find files vocab.json and merges.txt in model

svarog-vlz commented 4 years ago

Hi! When I try to run with models downloaded from aws, I get an error: I1227 05:17:21.238354 140364474943296 tokenization_utils.py:335] Didn't find file gpt2/s_checkpoint-1900000/vocab.json. We won't load it. I1227 05:17:21.238430 140364474943296 tokenization_utils.py:335] Didn't find file gpt2/s_checkpoint-1900000/merges.txt. We won't load it. I1227 05:17:21.238494 140364474943296 tokenization_utils.py:359] Didn't find file gpt2/s_checkpoint-1900000/added_tokens.json. We won't load it. I1227 05:17:21.238545 140364474943296 tokenization_utils.py:359] Didn't find file gpt2/s_checkpoint-1900000/special_tokens_map.json. We won't load it. I1227 05:17:21.238594 140364474943296 tokenization_utils.py:359] Didn't find file gpt2/s_checkpoint-1900000/tokenizer_config.json. We won't load it. Where can I find vocab.json and merges.txt or tokenization files? I'm try start with files from "bpe" directory, but but nothing happened. Thanks.

mgrankin commented 4 years ago

You don't need 'vocab.json' or 'merges.txt' to run the model. Use YTTM tokeniser with encoder.model file.

svarog-vlz commented 4 years ago

Thanks, it's working!

mgrankin / ru_transformers

Didn't find files vocab.json and merges.txt in model #7