Closed Zarc98 closed 1 year ago
int4 报错:RuntimeError: self and mat2 must have the same dtype 训练参数: CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path /models/bloomz-7b1-mt \ --do_train \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --quantization_bit 4 \ --output_dir bloomz_lora \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --resume_lora_training False \ --plot_loss \ --fp16
上述参数的 --quantization_bit 如果设置为 8 可正常训练 设备:RTX3080
seems relate to #15
请更新 peft 库版本。
pip install -U git+https://github.com/huggingface/peft.git
更新到最新peft库版本 可以正常开启int4 qlora训练
int4 报错:RuntimeError: self and mat2 must have the same dtype 训练参数: CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path /models/bloomz-7b1-mt \ --do_train \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --quantization_bit 4 \ --output_dir bloomz_lora \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --resume_lora_training False \ --plot_loss \ --fp16