Closed TYZY89 closed 1 month ago
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 llamafactory-cli train example.yaml
你是这样启动的吗
加deepspeed zero3,去平均分配显存。直接CUDA_VISIBLE_DEVICES没有用。
加deepspeed zero3,去平均分配显存。直接CUDA_VISIBLE_DEVICES没有用。
在配置文件加入这行 deepspeed: examples/deepspeed/ds_z3_config.json
,
然后这个命令启动 FORCE_TORCHRUN=8 llamafactory-cli train examples/train_lora/{配置文件}.yaml
(单机多卡的情况下,多机多卡自己看下example目录下的readme文件,调一下,我也没试过多机多卡)
predict 可能不支持 deepspeed
我是直接 llamafactory-cli train example.yaml, 直接启用8卡
codingma @.***> 于2024年7月13日周六 10:32写道:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 llamafactory-cli train example.yaml
你是这样启动的吗
— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226727606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJUKT2BP6SQW5JV65FDZMCGVDAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWG4ZDONRQGY . You are receiving this because you authored the thread.Message ID: @.***>
我试试,谢谢你哈
hoshi-hiyouga @.***> 于2024年7月13日周六 22:09写道:
predict 可能不支持 deepspeed
— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226921201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJT4E5X5HLLWQNSODPTZMEYILAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHEZDCMRQGE . You are receiving this because you authored the thread.Message ID: @.***>
predict是okay的,非常感谢
zhou yong @.***> 于2024年7月13日周六 23:49写道:
我试试,谢谢你哈
hoshi-hiyouga @.***> 于2024年7月13日周六 22:09写道:
predict 可能不支持 deepspeed
— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226921201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJT4E5X5HLLWQNSODPTZMEYILAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHEZDCMRQGE . You are receiving this because you authored the thread.Message ID: @.***>
cc @yzoaim please add deepspeed zero3 + batch predict to document
Reminder
System Info
model_name_or_path: llm/Qwen2-72B-Instruct
adapter_name_or_path: saves/qwen2_7b_errata_0705/lora_ace04_instruction_v1_savesteps_10/sft
method
stage: sft do_predict: true finetuning_type: lora
dataset
dataset: prompt_to_get_cot_normal template: qwen cutoff_len: 2048 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: saves/qwen2_72b_errata_0712/lora/predict overwrite_output_dir: true
eval
per_device_eval_batch_size: 1 predict_with_generate: true ddp_timeout: 180000000
Reproduction
8卡A100 80G 在 72b 的基座 predict 1k的数据显示oom, 所有的显卡同时加载整个模型参数, 导致oom 据官方 160G 即可, 我这80*8 都不够, 请问是bug还是需要设置什么参数;
Expected behavior
No response
Others
No response