Closed molereddy closed 7 months ago
I am sorry that you are having to face these issues. Thanks for patiently drilling through them. Deepspeed, while not critical, is the way all finetuning is done. It allows us to parallelize big models over multiple GPUs. Even though you dont see explicit reference to deepseed in the code, it is being intricately used under the hood in the trainer. Let me share with you my exact conda environment, and let us see if you can clone the environment. This may help resolve any version dependencies that are leading to this behaviour on your end.
Please find the yaml file here: environment.yml.zip
Can you run the following command to make your environment and let me know if this solves the problem?
conda env create -f environment.yml
This helped, thanks so much!
While running finetune.py, I'm encountering MPI related errors because of the
deepspeed='config/ds_config.json'
argument.How essential is deepspeed in terms of using the repository? I see that it is used only in the KL div based forget loss. There have been several issues I had to troubleshoot so far because of deepspeed related to mpi4py installs and such. In general, it does seem that I'm not alone in facing issues with deepseed, see https://www.reddit.com/r/Oobabooga/comments/13etobg/using_deepspeed_requires_lots_of_manual_tweaking/