hiyouga / ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Apache License 2.0
3.66k stars 471 forks source link

when `per_device_eval_batch_size` > 1 and launch by deepspeed, RuntimeError: Tensors must be contiguous #385

Closed jiahuanluo closed 1 year ago

jiahuanluo commented 1 year ago

RuntimeError: Tensors must be contiguous occurs when per_device_eval_batch_size > 1 cmd: deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port $MASTER_PORT src/train_bash.py \ --stage sft \ --model_name_or_path THUDM/chatglm2-6b \ --checkpoint_dir ${CHECKPOINT} \ --do_predict \ --dataset dev_data\ --overwrite_cache \ --finetuning_type lora \ --output_dir ${CHECKPOINT}/predict \ --overwrite_cache \ --per_device_eval_batch_size 4 \ --max_source_length 1024 \ --max_target_length 128 \ --max_samples 1000 \ --predict_with_generate \ --plot_loss \ --fp16

kuailehaha commented 1 year ago

我在ChatGLM和LLaMA的efficient_tuning下,用deepspeed和accelerate的多卡infer同报错 accelerate launch \ ./LLaMA-Efficient-Tuning/src/train_bash.py \ --max_samples 50 \ --model_name_or_path "Llama-2-13B-fp16/" \ --do_predict \ --dataset alpaca_zh \ --dataset_dir "LLaMA-Efficient-Tuning/data"\ --finetuning_type lora \ --output_dir Efficient_Tuning/llama2-13b \ --per_device_eval_batch_size 4 \ --predict_with_generate \ --fp16 蹲解决方法

hiyouga commented 1 year ago

目前 do_predict 仅支持单卡

jiahuanluo commented 1 year ago

LLaMA-Efficient-Tuning 之前是可以的,代码越更新越多bug了

kuailehaha commented 1 year ago

是这样的 6月份的版本可以accelerate launch多卡

hiyouga commented 1 year ago

@kuailehaha pull 最新的代码试一下