Closed wolf-li closed 3 months ago
Not all model in huggingface hub has tokenizer.json file such like Marian model. 'tokenizer_config.json', 'special_tokens_map.json', 'vocab.json', 'source.spm', 'target.spm', 'added_tokens.json' too much files. What should I do?
vocab.json can be used to load and parse into tokenizer info.
Seems one common approach so far is to convert the other tokenizer format into HF's tokenizer.json format
Not all model in huggingface hub has tokenizer.json file such like Marian model. 'tokenizer_config.json', 'special_tokens_map.json', 'vocab.json', 'source.spm', 'target.spm', 'added_tokens.json' too much files. What should I do?