hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.71k stars 4.01k forks source link

qwen-72b-chat和XVERSE-65B-chat使用8张A800进行lora+8跑不起来 #2520

Closed angel1288 closed 8 months ago

angel1288 commented 8 months ago

Reminder

Reproduction

请教一下哈 使用accelerate配置方式lora(rank=8)微调 qwen-72b-chat和XVERSE-65B-chat模型均跑不起来,一直oom; 微调使用的精度fp16,4张卡跑量化int4是可以的,但是非量化的不行,请问这个资源本来就不够,还是需要添加特殊配置,谢谢

Expected behavior

No response

System Info

No response

Others

No response

hiyouga commented 8 months ago

使用 Deepspeed zero3 + offload

angel1288 commented 8 months ago

@hiyouga 您好,我使用了一下配置,可以跑起来了,但是loss一开始就是0呢 { "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "zero_allow_untested_optimizer": true, "bf16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e9, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e9, "stage3_gather_16bit_weights_on_model_save": true } }

hiyouga commented 8 months ago

使用 https://github.com/xverse-ai/XVERSE-65B 里面提供的配置

angel1288 commented 8 months ago

使用 https://github.com/xverse-ai/XVERSE-65B 里面提供的配置

您好,用了这个配置,loss还是0;我现在这个使用的llmtuner=0.4.0 现在升级llmtuner=0.5.2试试哈

angel1288 commented 8 months ago

使用 https://github.com/xverse-ai/XVERSE-65B 里面提供的配置

您好,用了这个配置,loss还是0;我现在这个使用的llmtuner=0.4.0 现在升级llmtuner=0.5.2试试哈

@hiyouga 升级框架到llmtuner=0.5.2之后,用原来的配置也正常了哈~