Closed zxti closed 4 months ago
I've met the same error. If you fixed it, let me know please
Hi, you can download the tokenizer with mkdir data && cd data && mkdir llama && cd llama && wget https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-480k-1T/blob/main/tokenizer.model && cd ../..
That URL will serve you a redirect, so wget will download an html file and name it tokenizer.model.
When following PRETRAIN.md and running one of the data prep scripts:
The tokenizer throws this. It seems a checkpoint is first needed,
data/llama
? How do you get this?