Closed ananda1996ai closed 1 year ago
Hi @ananda1996ai Can you tell me a bit more? What model you want to train, your system config, number of GPUs etc. Also, the way you are using accelerate for training is wrong. It won't work that way.
Also, I would suggest to use deepspeed for training directly (its a bit complicated but not too much). Or use accelerate wrapper with deepspeed backend. Tutorial here: https://huggingface.co/docs/accelerate/usage_guides/deepspeed
closing this :)
I tried to use the same model loading method as in the bloom-accelerate-inference.py script and then instead of the generate function added a Trainer with data loaders to train few layers of the model (others were frozen). I set the local_rank argument in TrainingArgs and also set trainer.is_model_parallel to True.
I got the following error:
Could you please suggest what I might be doing wrong and what would be the correct way to use the loaded distributed model for training/finetuning?