Open lorabit110 opened 1 year ago
It's likely caused by the multi-GPU system (AWS EC2 p3.8xlarge with 4 V100s) I used. I tried using another single-GPU VM (AWS EC2 g3.2xlarge with 1 A10) and it worked. I am getting non-zero/nan eval loss.
To be honest I built this for running on a single gpu system so the code puts the entire model on the first gpu. I will try and get around to implementing this for multiple gpus as soon as possible but I am working on some other things as well. If you don’t want to wait, the deepspeed library will most likely be the move, I wouldn’t recommend using huggingfaces built in naive approach (with device map auto, etc) since it uses a naive multi-gpu system that will actually slow down training since each gpu is idle while the next goes.
Nevermind, ended up deciding to give it a go. Just install deepspeed and run with deepspeed command instead of python and you're set. Example: deepspeed src/finetune.py --base_model 'mosaicml/mpt-7b-instruct' --data_path 'yahma/alpaca-cleaned' --output_dir './lora-mpt' --lora_target_modules '[Wqkv]' --lora_r 8 --cutoff_len 768 --batch_size 128 --micro_batch_size 8
Let me know if you have any further issues.
I tried LoRA tuning mpt-7b and mpt-7b-instruct. I can get summary like this: wandb: Run summary: wandb: eval/loss nan wandb: eval/runtime 37.2157 wandb: eval/samples_per_second 53.741 wandb: eval/steps_per_second 1.693 wandb: train/epoch 0.6 wandb: train/global_step 234 wandb: train/total_flos 2.9642010569107046e+17 wandb: train/train_loss 0.0 wandb: train/train_runtime 1465.8832 wandb: train/train_samples_per_second 20.367 wandb: train/train_steps_per_second 0.16
But the train/loss is always 0 and eval/loss is always nan. Also, when I load the model using generate.py, it always generates "&". I have tried both yahma/alpaca-cleaned and a manually created simple dataset.