Closed yurinoviello closed 8 months ago
yep I recently tried similar setup with mistral.
If you remove the safetensor file in the checkpoint folder, the loading should load the adaptor_model.bin successfully.
Yes, you are right, removing the safetensor is the key.
Thanks so much for the quick response.
Hello, I am using the big-refactor branch. I am doing an experiment with LoRA finetuning on Mistral, however, even if the process ends successfully, I am not able to load the adapter in any way.
Error:
By looking on different forums, I found that this could be a common problem when using Deepspeed with Zero-3 and LoRA, however when I execute the fine-tuning with Zero-2 or Zero-1 conf, i have the following exception.
AssertionError: The parameter 447 has already been reduced. Gradient computed twice for this partition. Multiple gradient reduction is currently not supported
I am using the default configuration with a custom dataset.