Open narayanasastry-rvds opened 1 month ago
I had the same issue. Your checkpoint is, by default, saved in "./checkpoints". Then I ended up just checking the modification times to find the right one, lol. Then, you can merge.
Hope this helps.
Thanks for the reply. I have been passing correct checkpoints folder only. I haven't been using merge script from here as it is throwing error like this:
OSError: LLava_fine_tune/checkpoints/llava-v1.6-7b-lora/checkpoint-159 does not appear to have a file named config.json. Checkout 'https://huggingface.co/LLava_fine_tune/checkpoints/llava-v1.6-7b-lora/checkpoint-159/tree/main' for available files.
I think it is referring to Huggingface for the checkpoint but I have it on my local. Do you know how to fix this?
I had this issue too. Look into the script: finetune_task_lora.sh. Copy and paste what you have for your model within that script, and it should work. I had to do that. Also, perhaps you are giving the wrong relative path? Try giving it an absolute path for your checkpoints.
Thanks for the reply again. Now, I am able to merge the checkpoints. But next problem is: Model is not giving any output after "Assistant: ". Whereas base model is generating some response.
Describe the issue
Issue: Lora finetuning with Zero2.json and also Zero3.json. During finetuning, train and validation loss are reducing but when I see the weights in saved model checkpoint, it has only initialised weights.
Command:
Log:
Screenshots: Train and Validation loss during Model finetuning in below plot: