在用
torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json
训练qwen2+qlora+unsloth时(use_unsloth=true)出现错误:
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for exampledevice_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
qwen2-7b-sft-qlora.json文件参数设置如下:
完整错误如下:
2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task
2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset
2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl
2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training
2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset
2024-06-20 01:48:35.207 | INFO | main:main:426 - starting training
Traceback (most recent call last):
File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for exampledevice_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
在用 torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json 训练qwen2+qlora+unsloth时(use_unsloth=true)出现错误: ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}qwen2-7b-sft-qlora.json文件参数设置如下:
完整错误如下: 2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task 2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset 2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl 2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training 2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset 2024-06-20 01:48:35.207 | INFO | main:main:426 - starting training Traceback (most recent call last): File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}不使用unsloth,单机多卡正常训练,使用unsloth,单机单卡也可以正常训练,只有在unsloth+多卡的时候报错,请问这是因为什么呢?