Closed SeekPoint closed 1 year ago
试试
model = AutoModel.from_pretrained("THUDM/chatglm-6b").half().cuda()
试试
model = AutoModel.from_pretrained("THUDM/chatglm-6b").half().cuda()
it works with trust_remote_code=True!
试试
model = AutoModel.from_pretrained("THUDM/chatglm-6b").half().cuda()
it solves my problems! Thanks!
已收到。谢谢!
(gh_ChatGLM-Tuning) ub2004@ub2004-B85M-A0:~/llm_dev/ChatGLM-Tuning$ python3 finetune.py --dataset_path data/alpaca --lora_rank 2 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 500 --save_steps 100 --save_total_limit 2 --learning_rate 1e-4 --fp16 --remove_unused_columns false --logging_steps 50 --output_dir output /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
/home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/ub2004/anaconda3/envs/gh_ChatGLM-Tuning/lib')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ub2004/anaconda3/envs/gh_ChatGLM-Tuning did not contain libcudart.so as expected! Searching further paths... warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1648,unix/ub2004-B85M-A0'), PosixPath('local/ub2004-B85M-A0')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')} warn(msg) /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/00a04b8e_2929_4d34_a713_fca57864faa5')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 6.1 CUDA SETUP: Detected CUDA version 117 /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so... Explicitly passing a
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Overriding torch_dtype=None withtorch_dtype=torch.float16
due to requirements ofbitsandbytes
to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:46<00:00, 5.81s/it]len(dataset)=49917
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is :DefaultFlowCallback TensorBoardCallback WandbCallback /home/ub2004/.local/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
main()
File "finetune.py", line 111, in main
trainer.train()
File "/home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2655, in training_step
self.scaler.scale(loss).backward()
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, args)
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, args)
File "/home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ub2004/llm_dev/ChatGLM-Tuning/finetune.py:118 in │
│ │
│ 115 │
│ 116 │
│ 117 if name == "main": │
│ ❱ 118 │ main() │
│ 119 │
│ │
│ /home/ub2004/llm_dev/ChatGLM-Tuning/finetune.py:111 in main │
│ │
│ 108 │ │ callbacks=[TensorBoardCallback(writer)], │
│ 109 │ │ data_collator=data_collator, │
│ 110 │ ) │
│ ❱ 111 │ trainer.train() │
│ 112 │ writer.close() │
│ 113 │ # save model │
│ 114 │ model.save_pretrained(training_args.output_dir) │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py:1633 in train │
│ │
│ 1630 │ │ inner_training_loop = find_executable_batch_size( │
│ 1631 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1632 │ │ ) │
│ ❱ 1633 │ │ return inner_training_loop( │
│ 1634 │ │ │ args=args, │
│ 1635 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1636 │ │ │ trial=trial, │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py:1902 in │
│ _inner_training_loop │
│ │
│ 1899 │ │ │ │ │ with model.no_sync(): │
│ 1900 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1901 │ │ │ │ else: │
│ ❱ 1902 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1903 │ │ │ │ │
│ 1904 │ │ │ │ if ( │
│ 1905 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/transformers/trainer.py:2655 in training_step │
│ │
│ 2652 │ │ │ loss = loss / self.args.gradient_accumulation_steps │
│ 2653 │ │ │
│ 2654 │ │ if self.do_grad_scaling: │
│ ❱ 2655 │ │ │ self.scaler.scale(loss).backward() │
│ 2656 │ │ elif self.use_apex: │
│ 2657 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2658 │ │ │ │ scaled_loss.backward() │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/_tensor.py:488 in backward │
│ │
│ 485 │ │ │ │ create_graph=create_graph, │
│ 486 │ │ │ │ inputs=inputs, │
│ 487 │ │ │ ) │
│ ❱ 488 │ │ torch.autograd.backward( │
│ 489 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 490 │ │ ) │
│ 491 │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, gradtensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:456 in │
│ backward │
│ │
│ 453 │ │ │ │
│ 454 │ │ │ elif state.CB is not None: │
│ 455 │ │ │ │ CB = state.CB.to(ctx.dtypeA, copy=True).mul(state.SCB.unsqueeze(1).mul │
│ ❱ 456 │ │ │ │ grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype │
│ 457 │ │ │ elif state.CxB is not None: │
│ 458 │ │ │ │ │
│ 459 │ │ │ │ if state.tile_indices is None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float
wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
no_deprecation_warning=True
to disable this warning warnings.warn( /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: 3 wandb: You chose "Don't visualize my results" wandb: Tracking run with wandb version 0.14.2 wandb: W&B syncing is set tooffline
in this directory.wandb: Run
wandb online
or set WANDB_MODE=online to enable cloud syncing. 0%| | 0/500 [00:00<?, ?it/s]/home/ub2004/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") Traceback (most recent call last): File "finetune.py", line 118, in