hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

29.18k stars 3.58k forks source link

predict_oom #4803

Closed TYZY89 closed 1 month ago

TYZY89 commented 1 month ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

model_name_or_path: llm/Qwen2-72B-Instruct

adapter_name_or_path: saves/qwen2_7b_errata_0705/lora_ace04_instruction_v1_savesteps_10/sft

method

stage: sft do_predict: true finetuning_type: lora

dataset

dataset: prompt_to_get_cot_normal template: qwen cutoff_len: 2048 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/qwen2_72b_errata_0712/lora/predict overwrite_output_dir: true

eval

per_device_eval_batch_size: 1 predict_with_generate: true ddp_timeout: 180000000

Reproduction

8卡A100 80G 在 72b 的基座 predict 1k的数据显示oom, 所有的显卡同时加载整个模型参数, 导致oom 据官方 160G 即可, 我这80*8 都不够, 请问是bug还是需要设置什么参数;

Expected behavior

No response

Others

No response

codemayq commented 1 month ago

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 llamafactory-cli train example.yaml

你是这样启动的吗

Rocky77JHxu commented 1 month ago

加deepspeed zero3，去平均分配显存。直接CUDA_VISIBLE_DEVICES没有用。

Rocky77JHxu commented 1 month ago

加deepspeed zero3，去平均分配显存。直接CUDA_VISIBLE_DEVICES没有用。

在配置文件加入这行 deepspeed: examples/deepspeed/ds_z3_config.json，然后这个命令启动 FORCE_TORCHRUN=8 llamafactory-cli train examples/train_lora/{配置文件}.yaml （单机多卡的情况下，多机多卡自己看下example目录下的readme文件，调一下，我也没试过多机多卡）

hiyouga commented 1 month ago

predict 可能不支持 deepspeed

TYZY89 commented 1 month ago

我是直接 llamafactory-cli train example.yaml, 直接启用8卡

codingma @.***> 于2024年7月13日周六 10:32写道：

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 llamafactory-cli train example.yaml

你是这样启动的吗

— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226727606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJUKT2BP6SQW5JV65FDZMCGVDAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWG4ZDONRQGY . You are receiving this because you authored the thread.Message ID: @.***>

TYZY89 commented 1 month ago

我试试，谢谢你哈

hoshi-hiyouga @.***> 于2024年7月13日周六 22:09写道：

predict 可能不支持 deepspeed

— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226921201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJT4E5X5HLLWQNSODPTZMEYILAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHEZDCMRQGE . You are receiving this because you authored the thread.Message ID: @.***>

TYZY89 commented 1 month ago

predict是okay的，非常感谢

zhou yong @.***> 于2024年7月13日周六 23:49写道：

我试试，谢谢你哈

hoshi-hiyouga @.***> 于2024年7月13日周六 22:09写道：

predict 可能不支持 deepspeed

— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/4803#issuecomment-2226921201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUXJJT4E5X5HLLWQNSODPTZMEYILAVCNFSM6AAAAABKZ3JN7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHEZDCMRQGE . You are receiving this because you authored the thread.Message ID: @.***>

hiyouga commented 1 month ago

cc @yzoaim please add deepspeed zero3 + batch predict to document