modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2.5, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.49k stars 299 forks source link

训练qwen1.5-moe-A2.7B-chat速度缓慢,GPU利用率低 #868

Closed yangzhipeng1108 closed 4 months ago

yangzhipeng1108 commented 4 months ago

CUDA_VISIBLE_DEVICES=0 \ python3 llm_sft.py \ --model_type qwen1half-moe-a2_7b-chat \ --model_id_or_path /root/yovole/qwen/Qwen1.5-MoE-A2.7B-Chat \ --sft_type lora \ --tuner_backend swift \ --dtype AUTO \ --output_dir output \ --dataset dureader-robust-zh \ --train_dataset_sample 10000 \ --num_train_epochs 1 \ --max_length 1024 \ --check_dataset_strategy warning \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout_p 0.05 \ --lora_target_modules ALL \ --gradient_checkpointing true \ --batch_size 1 \ --weight_decay 0.1 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 16 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --eval_steps 100 \ --save_steps 100 \ --save_total_limit 2 \ --logging_steps 10 \ --use_flash_attn true \ --self_cognition_sample 1000 \ --custom_train_dataset_path /root/yovole/qwen/data/alpaca-gpt4-data-zh/alpaca_gpt4_data_zh.json \ --custom_val_dataset_path /root/yovole/qwen/data/alpaca-gpt4-data-zh/alpaca_gpt4_data_zh.json \ --model_name 卡卡罗特 \ --model_author 陶白白

image

image

tastelikefeet commented 4 months ago

这个慢的不正常,看下模型device,是不是有offloading到cpu的情况

cdxzyc commented 3 months ago

请问解决了嘛,遇到了同样的问题