project-baize / baize-chatbot

Let ChatGPT teach your own chatbot in hours with a single GPU!
https://arxiv.org/abs/2304.01196
GNU General Public License v3.0
3.15k stars 275 forks source link

Errors running 13b 8bit #23

Open gamerscomplete opened 1 year ago

gamerscomplete commented 1 year ago

I can run the 7b without issue but loading 13b I get the follow error. The error comes up soon as the first message is sent.

Traceback (most recent call last): File "/home/chris/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/home/chris/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/home/chris/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 898, in call_function prediction = await anyio.to_thread.run_sync( File "/home/chris/miniconda3/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/chris/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/chris/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/chris/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 549, in async_iteration return next(iterator) File "/vol/storage/checkout/baize/demo/app.py", line 48, in predict for x in sample_decode( File "/vol/storage/checkout/baize/demo/app_modules/utils.py", line 265, in sample_decode outputs = model(input_ids) File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/peft/peft_model.py", line 579, in forward return self.base_model( File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chris/miniconda3/lib/python3.10/site-packages/peft/tuners/lora.py", line 591, in forward result = super().forward(x) File "/home/chris/miniconda3/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/chris/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/chris/miniconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/chris/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 317, in forward state.CxB, state.SB = F.transform(state.CB, to_order=formatB) File "/home/chris/miniconda3/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1698, in transform prev_device = pre_call(A.device) AttributeError: 'NoneType' object has no attribute 'device'

yfq512 commented 1 year ago

same issue!

JetRunner commented 1 year ago

@guoday Can you look into this?

guoday commented 1 year ago

When you load 7b, do you also using 8 Bit? or only use fp16?

gamerscomplete commented 1 year ago

I can run 7b with and without the load_8bit=True and it works fine

guoday commented 1 year ago

I attempt to load 13b with 8 bits and it works without issue. It appears that the error was caused by bitsandbytes. Unfortunately, I am unsure of how to resolve the issue.

gamerscomplete commented 1 year ago

@guoday I am on Version: 0.37.2 what version of bitsandbytes are you using? I can try and change versions and see if that resolves it

guoday commented 1 year ago

My version is also 0.37.2, I list my related environment setting here.

Python 3.8 bitsandbytes 0.37.2 CUDA 12.1 Transformers 4.28.0.dev0 peft 0.3.0.dev0 torch 2.0.0+cu117

KhalilWong commented 1 year ago

multi GPUs? IF SO, change device_map="auto" in utils.py line 353 to device_map={"":0} or other index, and add it to PeftModel.from_pretrained().

guoday commented 1 year ago

multi GPUs? IF SO, change device_map="auto" in utils.py line 353 to device_map={"":0} or other index, and add it to PeftModel.from_pretrained().

Thanks, we do not attempt to utilize multiple GPUs in our code and only support single GPU now. Perhaps the issue could be resolved by utilizing the command export CUDA_VISIBLE_DEVICES=0 python app.py.