Open THUchenzhou opened 6 months ago
It's in the original
folder. Because the transformers
compatible version only needs tokenizer.json
🤗
Thanks!
It is in the original folder, but does not seem valid. Any idea?
@dejankocic The Llama 3 tokenizer is different than the one used by Llama 2. It's a BPE tokenizer built with the tiktoken library, whereas Llama 2 used sentencepiece.
@dejankocic The Llama 3 tokenizer is different than the one used by Llama 2. It's a BPE tokenizer built with the tiktoken library, whereas Llama 2 used sentencepiece.
I am fine with everything it is inside the repo I downloaded. The file found in the original repo looks no valid on the first start, I havent changed anything.
It's in the
original
folder. Because thetransformers
compatible version only needstokenizer.json
🤗
It seems the tokenizer.model
within the provided directory is encountering issues and fails to load properly. I'm encountering this challenge while attempting to utilize it for training with Megatron-LM. Could you kindly offer a resolution or guidance on how to address this predicament?
I have no idea what megatron LM uses to load the tokenizer, but if megatron LM relies on sentencepiece, there is nothing I can do to help as converting anything to a sentencepiece format is pretty much impossible.
Meta-Llama-3-8B does not appear to have a file named tokenizer.model. How to generate the file of tokenizer.model?