Closed varunsatish closed 2 months ago
I spent about ~1:30 on this August 9th and couldn't figure it out.
@keyonvafa and I made a fair bit of progress on this over the weekend. Recall, the problem was that we initially only uploaded three files to the RA environment: consolidated.00.pth, tokenizer.model, and params.json. These were the files included in meta-llama/Meta-Llama-3-8B/original/.
The TLDR is that this is unresolved for now. I think it's best to wait for CBS to upload our new model files to the RA environment.
You can take a look at the code in recipes/quickstart/reading_pth_file/reproducing_erorr.py
.
We were able to successfully load the model weights through the consolidated.00.pth file.
However, in order to do fine-tuning we need the tokenizer as well. The only way that I could manage to load the tokenizer.model file was by copying tokenizer.json and tokenizer_config.json from the meta-llama/Meta-Llama-3-8B/ directory.
These are big files (2k and 40k lines of code), so I can't copy them manually. We could try uploading them, but that gets us back to the original position of needing to wait on the CBS bottleneck.
I also tried another way around this using the SentencePiece package directly, however, I kept running into this error: An error occurred using SentencePiece: Internal: could not parse ModelProto from <path to tokenizer.model>
. Some reading online leads me to believe this is due to the fact that in Llama-3 Meta moved from SentencePiece to Tiktokenizer. A solution using Tiktokenizer would require us to update the OSSC environment. I think given that one of the goals tomorrow is getting through dependency hell, it is not a good idea to update it before that.
If the model works but we don't have the right tokenizer it will still be useful to run to get a sense of computation time (since that doesn't depend on having the right tokenizer)
@keyonvafa gotcha, but wouldn't we need some sort of tokenizer? If the thread above remains, my understanding is that we don't have any tokenizer that works in the RA environment (I.e. not just the incorrect one)
Another tidbit:
tokenizer.model (by itself) worked as the base tokenizer file for torchtune. I looking into torchtune documentation, and I am seeing that we can read in the tokenizer if we use tiktoken: torchtune/models/llama3/_tokenizer.py
Closing the loop: For the initial OSSC runs on August 14th and 15th, we imported config.json, tokenizer.json and tokenizer_config.json into OSSC to get the model to run.
Models that are saved in
consolidated.pth
format don't appear to be supported by llama-recipes. This is because the model is loaded using LlamaForCausalLM.from_pretrained() from the transformers library (see:src/llama_recipes/finetuning.py
) which takes .safetensors only.