[PPU]大佬有对ppu环境进行过测试么

Reminder

dpo数据量只有850条。

[X] I have read the README and searched the existing issues.

System Info

model

model_name_or_path: /mnt/ant-cc/yungui.zs/project/IdentifyRequest/checkpoint/factory_qwen14bchat_data72w+choice7w+aq8.3w_ep4_batch16_sft_full_lr5e5/checkpoint-5400/

method

stage: dpo do_train: true finetuning_type: full lora_target: all pref_beta: 0.1 pref_loss: simpo

dataset

dataset: train_dpo_llama_factory dataset_dir: /dpo_data_llama_factory_data/ template: qwen cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: model_output_path logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 8 gradient_accumulation_steps: 1 learning_rate: 5.0e-6 num_train_epochs: 100 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000

eval

val_size: 0.05 per_device_eval_batch_size: 1 eval_strategy: epoch eval_steps: 500

Reproduction

我这边工作环境用的显卡是ppu类型，之前使用llama_factory还是可以正常使用的（运行train_bash.py的方式），最近在换成llamafactory-cli的方式进行dpo训练，发现ppu利用率为0，模型默认跑到cpu里训练了。大佬，有在ppu上面做过应用么？想找一个适合在ppu环境依赖库，有什么好方法么？🙏

Expected behavior

No response

Others

No response

hiyouga / LLaMA-Factory