Closed aldrinc closed 11 months ago
Currently DeepSpeed ZeRO-3 may be incompatible with 4-bit training, consider using fp16 training instead. BTW, we do not recommend using a non-English corpus to fine-tune the LLaMA-2 models.
DeepSpeed Zero-2 should work no? And yeah, just selected webqa to demonstrate the issue - we're using English dataset for actual training.
DeepSpeed ZeRO-2 should be compatible with QLoRA.
@aldrinc do you sloved the problem? I meet a same problem
@hiyouga which parameter do you say? Do you say mixed_precision should use fp16?
but utilization is 0 for all gpus
@zhangjunyi111 FP16 is required
@zhangjunyi111 I did.
FP16 like @hiyouga said worked for me.
Share your config if you still can't solve your issue.
I am confusing . I understand your command ,accelerate launch src/train_bash.py --stage sft --model_name_or_path meta-llama/Llama-2-7b-hf --do_train --dataset webqa --finetuning_type lora --output_dir /workspace/webqa-test --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 50 --learning_rate 2e-5 --num_train_epochs 1.0 --plot_loss --fp16 --quantization_bit 4 --template llama2, has required fp16.
I sloved my problem by updating my pytorch from 1.3.1 to 2.0.1
Unfortunity, Today I use tow nodes with 13 gpus to run ,i find one is normal ,but the other hangs . But ,when i use 16 gpus, the result is normal ,And i can get the model.
where is config.yaml?
I am facing an issue with the below configuration (was working yesterday and for the last week) where the model loads and dataset is tokenized but then the script hangs (GPU utilization spikes to 100% for all GPUs) but no training starts. Only change was previously I wasn't required to use template but now the CLI forces me to add it.
Terminal log output attached. log_output.txt
setup.sh
Accelerate Config
Run command