varunsatish / llama-recipes-fertility

1 stars 0 forks source link

Using consolidated.pth #20

Closed varunsatish closed 2 months ago

varunsatish commented 2 months ago

Models that are saved in consolidated.pth format don't appear to be supported by llama-recipes. This is because the model is loaded using LlamaForCausalLM.from_pretrained() from the transformers library (see: src/llama_recipes/finetuning.py) which takes .safetensors only.

varunsatish commented 2 months ago

I spent about ~1:30 on this August 9th and couldn't figure it out.

varunsatish commented 2 months ago

@keyonvafa and I made a fair bit of progress on this over the weekend. Recall, the problem was that we initially only uploaded three files to the RA environment: consolidated.00.pth, tokenizer.model, and params.json. These were the files included in meta-llama/Meta-Llama-3-8B/original/.

The TLDR is that this is unresolved for now. I think it's best to wait for CBS to upload our new model files to the RA environment.

You can take a look at the code in recipes/quickstart/reading_pth_file/reproducing_erorr.py.

We were able to successfully load the model weights through the consolidated.00.pth file.

However, in order to do fine-tuning we need the tokenizer as well. The only way that I could manage to load the tokenizer.model file was by copying tokenizer.json and tokenizer_config.json from the meta-llama/Meta-Llama-3-8B/ directory.

These are big files (2k and 40k lines of code), so I can't copy them manually. We could try uploading them, but that gets us back to the original position of needing to wait on the CBS bottleneck.

I also tried another way around this using the SentencePiece package directly, however, I kept running into this error: An error occurred using SentencePiece: Internal: could not parse ModelProto from <path to tokenizer.model>. Some reading online leads me to believe this is due to the fact that in Llama-3 Meta moved from SentencePiece to Tiktokenizer. A solution using Tiktokenizer would require us to update the OSSC environment. I think given that one of the goals tomorrow is getting through dependency hell, it is not a good idea to update it before that.

keyonvafa commented 2 months ago

If the model works but we don't have the right tokenizer it will still be useful to run to get a sense of computation time (since that doesn't depend on having the right tokenizer)

varunsatish commented 2 months ago

@keyonvafa gotcha, but wouldn't we need some sort of tokenizer? If the thread above remains, my understanding is that we don't have any tokenizer that works in the RA environment (I.e. not just the incorrect one)

varunsatish commented 2 months ago

Another tidbit:

tokenizer.model (by itself) worked as the base tokenizer file for torchtune. I looking into torchtune documentation, and I am seeing that we can read in the tokenizer if we use tiktoken: torchtune/models/llama3/_tokenizer.py

varunsatish commented 2 months ago

Closing the loop: For the initial OSSC runs on August 14th and 15th, we imported config.json, tokenizer.json and tokenizer_config.json into OSSC to get the model to run.