Open Hello1024 opened 1 year ago
Thanks for the report. Looks like my janky model loading/reloading isn't working. Fixing.
Should be fixed in latest master!
@lxe I just did a git clone to try out your fix and I still get this on colab :(
I just tried on a fresh Tesla T4 colab and unable to reproduce. Using the vanilla model with no LoRA.
Just reproduced! Will fix!
I would try again on colab
@lxe I just deleted my runtime to get a fresh environment, saw git clone pull a new copy of the code, and I still get the error on a GPU environment on Colab.
Error:
Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/usr/local/lib/python3.9/dist-packages/gradio/helpers.py", line 587, in tracked_fn response = fn(*args) File "/content/simple-llama-finetuner/main.py", line 104, in generate_text output = model.generate( # type: ignore File "/usr/local/lib/python3.9/dist-packages/peft/peft_model.py", line 581, in generate outputs = self.base_model.generate(**kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 1462, in generate return self.sample( File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 2478, in sample outputs = self( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 705, in forward outputs = self.model( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 593, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 311, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 212, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/peft/tuners/lora.py", line 522, in forward result = super().forward(x) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 317, in forward state.CxB, state.SB = F.transform(state.CB, to_order=formatB) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/functional.py", line 1698, in transform prev_device = pre_call(A.device) AttributeError: 'NoneType' object has no attribute 'device'
Colab, gives the following error during inference: