bin C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll
CUDA SETUP: CUDA runtime path found: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll...
The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.11it/s]
trainable params: 974,848 || all params: 3,389,286,400 || trainable%: 0.0287626327477076
Found cached dataset json (C:/Users/Administrator/.cache/huggingface/datasets/json/default-d642ff6439cea90e/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 243.11it/s]
Found cached dataset json (C:/Users/Administrator/.cache/huggingface/datasets/json/default-bf648ec70cbcb4a4/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.15.3
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/3581 [00:00<?, ?it/s]use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
Traceback (most recent call last):
File "E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py", line 206, in
train(args)
File "E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py", line 200, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 1645, in train
return inner_training_loop(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 2770, in training_step
self.accelerator.backward(loss)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1821, in backward
loss.backward(*kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.py", line 487, in backward
torch.autograd.backward(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\function.py", line 274, in apply
return user_fn(self, args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py:206 in │
│ │
│ 203 │
│ 204 if name == "main": │
│ 205 │ args = parse_args() │
│ ❱ 206 │ train(args) │
│ 207 │
│ 208 │
│ │
│ E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py:200 in train │
│ │
│ 197 │ │ data_collator=data_collator │
│ 198 │ ) │
│ 199 │ │
│ ❱ 200 │ trainer.train(resume_from_checkpoint=resume_from_checkpoint) │
│ 201 │ trainer.model.save_pretrained(hf_train_args.output_dir) │
│ 202 │
│ 203 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:1645 in train │
│ │
│ 1642 │ │ inner_training_loop = find_executable_batch_size( │
│ 1643 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1644 │ │ ) │
│ ❱ 1645 │ │ return inner_training_loop( │
│ 1646 │ │ │ args=args, │
│ 1647 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1648 │ │ │ trial=trial, │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:1938 in _inner_training_loop │
│ │
│ 1935 │ │ │ │ │ self.control = self.callback_handler.on_step_begin(args, self.state, │
│ 1936 │ │ │ │ │
│ 1937 │ │ │ │ with self.accelerator.accumulate(model): │
│ ❱ 1938 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1939 │ │ │ │ │
│ 1940 │ │ │ │ if ( │
│ 1941 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:2770 in training_step │
│ │
│ 2767 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2768 │ │ │ │ scaled_loss.backward() │
│ 2769 │ │ else: │
│ ❱ 2770 │ │ │ self.accelerator.backward(loss) │
│ 2771 │ │ │
│ 2772 │ │ return loss.detach() / self.args.gradient_accumulation_steps │
│ 2773 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\acce │
│ lerator.py:1821 in backward │
│ │
│ 1818 │ │ elif self.scaler is not None: │
│ 1819 │ │ │ self.scaler.scale(loss).backward(kwargs) │
│ 1820 │ │ else: │
│ ❱ 1821 │ │ │ loss.backward(kwargs) │
│ 1822 │ │
│ 1823 │ def unscale_gradients(self, optimizer=None): │
│ 1824 │ │ """ │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.p │
│ y:487 in backward │
│ │
│ 484 │ │ │ │ create_graph=create_graph, │
│ 485 │ │ │ │ inputs=inputs, │
│ 486 │ │ │ ) │
│ ❱ 487 │ │ torch.autograd.backward( │
│ 488 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 489 │ │ ) │
│ 490 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ init.py:200 in backward │
│ │
│ 197 │ # The reason we repeat same the comment below is that │
│ 198 │ # some Python versions print out the first line of a multi-line function │
│ 199 │ # calls in the traceback and some print out the last line │
│ ❱ 200 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 201 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 202 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 203 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ function.py:274 in apply │
│ │
│ 271 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 272 │ │ │ │ │ │ │ "of them.") │
│ 273 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 274 │ │ return user_fn(self, args) │
│ 275 │ │
│ 276 │ def apply_jvp(self, args): │
│ 277 │ │ # _forward_cls is defined by derived class │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\che │
│ ckpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ init.py:200 in backward │
│ │
│ 197 │ # The reason we repeat same the comment below is that │
│ 198 │ # some Python versions print out the first line of a multi-line function │
│ 199 │ # calls in the traceback and some print out the last line │
│ ❱ 200 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 201 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 202 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 203 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
对chatglm2进行lora微调时,提示CUDA error: invalid argument;使用的windows系统,python310,cuda:11.8 PS E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main> python train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path THUDM/chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp16
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll CUDA SETUP: CUDA runtime path found: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudart64_110.dll CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll... The model weights are not tied. Please use the
train(args)
File "E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py", line 200, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 1645, in train
return inner_training_loop(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\trainer.py", line 2770, in training_step
self.accelerator.backward(loss)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1821, in backward
loss.backward(*kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.py", line 487, in backward
torch.autograd.backward(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\function.py", line 274, in apply
return user_fn(self, args)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
tie_weights
method before using theinfer_auto_device
function. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.11it/s] trainable params: 974,848 || all params: 3,389,286,400 || trainable%: 0.0287626327477076 Found cached dataset json (C:/Users/Administrator/.cache/huggingface/datasets/json/default-d642ff6439cea90e/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 243.11it/s] Found cached dataset json (C:/Users/Administrator/.cache/huggingface/datasets/json/default-bf648ec70cbcb4a4/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s] wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: 3 wandb: You chose "Don't visualize my results" wandb: Tracking run with wandb version 0.15.3 wandb: W&B syncing is set tooffline
in this directory. wandb: Runwandb online
or set WANDB_MODE=online to enable cloud syncing. 0%| | 0/3581 [00:00<?, ?it/s]use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... Traceback (most recent call last): File "E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py", line 206, inTORCH_USE_CUDA_DSA
to enable device-side assertions.╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py:206 in │
│ │
│ 203 │
│ 204 if name == "main": │
│ 205 │ args = parse_args() │
│ ❱ 206 │ train(args) │
│ 207 │
│ 208 │
│ │
│ E:\Chatglm2-Qlora\chatGLM-6B-QLoRA-main\train_qlora.py:200 in train │
│ │
│ 197 │ │ data_collator=data_collator │
│ 198 │ ) │
│ 199 │ │
│ ❱ 200 │ trainer.train(resume_from_checkpoint=resume_from_checkpoint) │
│ 201 │ trainer.model.save_pretrained(hf_train_args.output_dir) │
│ 202 │
│ 203 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:1645 in train │
│ │
│ 1642 │ │ inner_training_loop = find_executable_batch_size( │
│ 1643 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1644 │ │ ) │
│ ❱ 1645 │ │ return inner_training_loop( │
│ 1646 │ │ │ args=args, │
│ 1647 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1648 │ │ │ trial=trial, │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:1938 in _inner_training_loop │
│ │
│ 1935 │ │ │ │ │ self.control = self.callback_handler.on_step_begin(args, self.state, │
│ 1936 │ │ │ │ │
│ 1937 │ │ │ │ with self.accelerator.accumulate(model): │
│ ❱ 1938 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1939 │ │ │ │ │
│ 1940 │ │ │ │ if ( │
│ 1941 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\tr │
│ ainer.py:2770 in training_step │
│ │
│ 2767 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2768 │ │ │ │ scaled_loss.backward() │
│ 2769 │ │ else: │
│ ❱ 2770 │ │ │ self.accelerator.backward(loss) │
│ 2771 │ │ │
│ 2772 │ │ return loss.detach() / self.args.gradient_accumulation_steps │
│ 2773 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\acce │
│ lerator.py:1821 in backward │
│ │
│ 1818 │ │ elif self.scaler is not None: │
│ 1819 │ │ │ self.scaler.scale(loss).backward(kwargs) │
│ 1820 │ │ else: │
│ ❱ 1821 │ │ │ loss.backward(kwargs) │
│ 1822 │ │
│ 1823 │ def unscale_gradients(self, optimizer=None): │
│ 1824 │ │ """ │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.p │
│ y:487 in backward │
│ │
│ 484 │ │ │ │ create_graph=create_graph, │
│ 485 │ │ │ │ inputs=inputs, │
│ 486 │ │ │ ) │
│ ❱ 487 │ │ torch.autograd.backward( │
│ 488 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 489 │ │ ) │
│ 490 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ init.py:200 in backward │
│ │
│ 197 │ # The reason we repeat same the comment below is that │
│ 198 │ # some Python versions print out the first line of a multi-line function │
│ 199 │ # calls in the traceback and some print out the last line │
│ ❱ 200 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 201 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 202 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 203 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ function.py:274 in apply │
│ │
│ 271 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 272 │ │ │ │ │ │ │ "of them.") │
│ 273 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 274 │ │ return user_fn(self, args) │
│ 275 │ │
│ 276 │ def apply_jvp(self, args): │
│ 277 │ │ # _forward_cls is defined by derived class │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\che │
│ ckpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\ │
│ init.py:200 in backward │
│ │
│ 197 │ # The reason we repeat same the comment below is that │
│ 198 │ # some Python versions print out the first line of a multi-line function │
│ 199 │ # calls in the traceback and some print out the last line │
│ ❱ 200 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 201 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 202 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 203 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.