Finetune leads to ValueError: Expected a cuda device, but got: cpu

Dear all,

executing finetuning.py consistantly leads for me to the above error. I would be very thankful for a hint that leads me to the right direction. The log was taken from Ubuntu 22.04 machine, but was also seen on a Windows machine using Ubuntu LSM.

python3 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'yahma/alpaca-cleaned' --output_dir './lora-alpaca'

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/home/home/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('local/home-H170N-WIFI'), PosixPath('@/tmp/.ICE-unix/3847,unix/home-H170N-WIFI')} warn(msg) /home/home/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('1'), PosixPath('0')} warn(msg) /home/home/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/3d4dfe16_8624_4492_8c1f_662466cb08ab')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 6.1 CUDA SETUP: Detected CUDA version 121 /home/home/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /home/home/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so... Training Alpaca-LoRA model with params: base_model: decapoda-research/llama-7b-hf data_path: yahma/alpaca-cleaned output_dir: ./lora-alpaca batch_size: 128 micro_batch_size: 4 num_epochs: 3 learning_rate: 0.0003 cutoff_len: 256 val_set_size: 2000 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True group_by_length: False wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: None

Traceback (most recent call last): File "/home/home/alpaca-lora/finetune.py", line 294, in fire.Fire(train) File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/home/home/alpaca-lora/finetune.py", line 261, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train return inner_training_loop( File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1911, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2657, in training_step loss = self.compute_loss(model, inputs) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2689, in compute_loss outputs = model(inputs) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/peft/peft_model.py", line 529, in forward return self.base_model( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 710, in forward outputs = self.model( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 590, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/home/home/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/home/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/home/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 586, in custom_forward return module(inputs, output_attentions, None) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 313, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 214, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/peft/tuners/lora.py", line 522, in forward result = super().forward(x) File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/home/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, *kwargs) # type: ignore[misc] File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 273, in forward using_igemmlt = torch.cuda.get_device_capability(device=A.device) >= (7, 5) and not state.force_no_igemmlt File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 381, in get_device_capability prop = get_device_properties(device) File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 396, in get_device_properties device = _get_device_index(device, optional=True) File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/_utils.py", line 32, in _get_device_index raise ValueError('Expected a cuda device, but got: {}'.format(device)) ValueError: Expected a cuda device, but got: cpu Traceback (most recent call last): File "/home/home/alpaca-lora/finetune.py", line 294, in fire.Fire(train) File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/home/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/home/home/alpaca-lora/finetune.py", line 261, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train return inner_training_loop( File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1911, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2657, in training_step loss = self.compute_loss(model, inputs) File "/home/home/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2689, in compute_loss outputs = model(inputs) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/peft/peft_model.py", line 529, in forward return self.base_model( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 710, in forward outputs = self.model( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 590, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/home/home/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/home/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/home/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 586, in custom_forward return module(inputs, output_attentions, None) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 313, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 214, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/home/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/home/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/home/.local/lib/python3.10/site-packages/peft/tuners/lora.py", line 522, in forward result = super().forward(x) File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/home/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/home/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 273, in forward using_igemmlt = torch.cuda.get_device_capability(device=A.device) >= (7, 5) and not state.force_no_igemmlt File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 381, in get_device_capability prop = get_device_properties(device) File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 396, in get_device_properties device = _get_device_index(device, optional=True) File "/home/home/.local/lib/python3.10/site-packages/torch/cuda/_utils.py", line 32, in _get_device_index raise ValueError('Expected a cuda device, but got: {}'.format(device)) ValueError: Expected a cuda device, but got: cpu

Thanks, Oliver

i am encoutering the same issue here during generate.py with a 8.6 Compute capable GPU

CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... Loading checkpoint shards: 100%|██████████| 33/33 [00:07<00:00, 4.16it/s] /usr/local/lib/python3.10/dist-packages/gradio/inputs.py:27: UserWarning: Usage of gradio.inputs is deprecated, and will not be supported in the future, please import your component from gradio.components warnings.warn( /usr/local/lib/python3.10/dist-packages/gradio/deprecation.py:40: UserWarning: optional parameter is deprecated, and it has no effect warnings.warn(value) /usr/local/lib/python3.10/dist-packages/gradio/deprecation.py:40: UserWarning: numeric parameter is deprecated, and it has no effect warnings.warn(value) /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1405: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cpu') before running .generate(). warnings.warn( Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1108, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 915, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/workspace/generate.py", line 107, in evaluate generation_output = model.generate( File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 627, in generate outputs = self.base_model.generate(kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1524, in generate return self.beam_search( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2810, in beam_search outputs = self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 196, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 576, in forward result = super().forward(x) File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 273, in forward using_igemmlt = torch.cuda.get_device_capability(device=A.device) >= (7, 5) and not state.force_no_igemmlt File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 390, in get_device_capability prop = get_device_properties(device) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 405, in get_device_properties device = _get_device_index(device, optional=True) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/_utils.py", line 32, in _get_device_index raise ValueError('Expected a cuda device, but got: {}'.format(device)) ValueError: Expected a cuda device, but got: cpu

tloen / alpaca-lora

Finetune leads to ValueError: Expected a cuda device, but got: cpu #216