I have changed some parameters in the training code as instructed, but when I run dpo on 8*A6000, I get these errors.
If I understand correctly, habana is only used for hpu training.
Details
Traceback (most recent call last):
File "/data1/yoyo/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/finetuning/dpo_pipeline/dpo_clm.py", line 219, in
model_args, data_args, training_args, finetune_args = parser.parse_args_into_dataclasses()
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "", line 132, in __init__
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/transformers/training_args.py", line 522, in __post_init__
device_is_hpu = self.device.type == "hpu"
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/training_args.py", line 1901, in device
return self._setup_devices
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/utils/generic.py", line 54, in __get__
cached = self.fget(obj)
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/transformers/training_args.py", line 679, in _setup_devices
self.distributed_state = GaudiPartialState(cpu=False, backend=self.ddp_backend)
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/accelerate/state.py", line 83, in __init__
self.device = torch.device("cpu") if cpu else self.default_device
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/accelerate/state.py", line 123, in default_device
import habana_frameworks.torch.hpu as hthpu
ModuleNotFoundError: No module named 'habana_frameworks'
This is the training script (I don’t know how to assign --device, I just added this parameter)
Also, when I run sft(finetune_neuralchat_v3.py), accelerate is automatically set to cpu
Details
[INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
"No device has been set. Use either --use_habana to run on HPU or --no_cuda to run on CPU."
For nvidia gpu, you don't need install optimum-habana, because the code will check 'is_optimum_habana_available()' for habana device. So you can uninstall this package and don't need set "--use_habana" and "--use_lazy_mode ".
The "DPOTrainer" inherits from huggingface/transformers "Trainer", so the device setting is same with it. if the environment has gpu, the code would check this and use it. If set "--use_cpu", the code will run on cpu.
I have changed some parameters in the training code as instructed, but when I run dpo on 8*A6000, I get these errors. If I understand correctly, habana is only used for hpu training.
Details
Traceback (most recent call last): File "/data1/yoyo/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/finetuning/dpo_pipeline/dpo_clm.py", line 219, in
model_args, data_args, training_args, finetune_args = parser.parse_args_into_dataclasses()
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "", line 132, in __init__
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/transformers/training_args.py", line 522, in __post_init__
device_is_hpu = self.device.type == "hpu"
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/training_args.py", line 1901, in device
return self._setup_devices
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/transformers/utils/generic.py", line 54, in __get__
cached = self.fget(obj)
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/transformers/training_args.py", line 679, in _setup_devices
self.distributed_state = GaudiPartialState(cpu=False, backend=self.ddp_backend)
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/accelerate/state.py", line 83, in __init__
self.device = torch.device("cpu") if cpu else self.default_device
File "/root/anaconda3/envs/intel_eft/lib/python3.10/site-packages/optimum/habana/accelerate/state.py", line 123, in default_device
import habana_frameworks.torch.hpu as hthpu
ModuleNotFoundError: No module named 'habana_frameworks'
This is the training script (I don’t know how to assign --device, I just added this parameter)
Details
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python dpo_clm.py \ --model_name_or_path "/data1/yoyo/intel-extension-for-transformers/data/Mistral-7B-v0.1" \ --output_dir "/data1/yoyo/intel-extension-for-transformers/out/dpo_test" \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 8 \ --learning_rate 5e-4 \ --max_steps 1000 \ --save_steps 10 \ --lora_alpha 16 \ --lora_rank 16 \ --lora_dropout 0.05 \ --dataset_name Intel/orca_dpo_pairs \ --bf16 \ --use_auth_token True \ --use_habana False \ --use_lazy_mode False \ --device "auto"
Also, when I run sft(finetune_neuralchat_v3.py), accelerate is automatically set to cpu
Details
[INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect) "No device has been set. Use either --use_habana to run on HPU or --no_cuda to run on CPU."
Operating system: CentOS 7 Python: 3.10 torch: 2.1.0 CUDA: 12.2 optimum-habana: 1.9.0 transformers: 4.34.1 accelerate: 0.25.0