hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.52k stars 3.16k forks source link

[PPU]大佬有对ppu环境进行过测试么 #4606

Open willionZS opened 5 days ago

willionZS commented 5 days ago

Reminder

dpo数据量只有850条。

System Info

model

model_name_or_path: /mnt/ant-cc/yungui.zs/project/IdentifyRequest/checkpoint/factory_qwen14bchat_data72w+choice7w+aq8.3w_ep4_batch16_sft_full_lr5e5/checkpoint-5400/

method

stage: dpo do_train: true finetuning_type: full lora_target: all pref_beta: 0.1 pref_loss: simpo

dataset

dataset: train_dpo_llama_factory dataset_dir: /dpo_data_llama_factory_data/ template: qwen cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: model_output_path logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 8 gradient_accumulation_steps: 1 learning_rate: 5.0e-6 num_train_epochs: 100 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000

eval

val_size: 0.05 per_device_eval_batch_size: 1 eval_strategy: epoch eval_steps: 500

Reproduction

我这边工作环境用的显卡是ppu类型,之前使用llama_factory还是可以正常使用的(运行train_bash.py的方式), 最近在换成llamafactory-cli的方式进行dpo训练,发现ppu利用率为0,模型默认跑到cpu里训练了。 大佬,有在ppu上面做过应用么?想找一个适合在ppu环境依赖库,有什么好方法么?🙏

Expected behavior

No response

Others

No response

GuoZoneDUT commented 1 day ago

这个问题有解决吗?我发现配置好环境之后,torch.cuda.is_available()会报False。导致在cpu跑