run app.py error - Githubissues

XvHaidong commented 1 year ago

Hello, when I run demo/app.py with 7B model, I got this problem 'addmm_implcpu" not implemented for 'Half'. Could you please tell me how to fix it? This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces Traceback (most recent call last): File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 1069, in process_api result = await self.call_function( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 892, in call_function prediction = await anyio.to_thread.run_sync( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, args) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/utils.py", line 549, in async_iteration return next(iterator) File "app.py", line 43, in predict for x in greedy_search(input_ids,model,tokenizer,stop_words=["[|Human|]", "[|AI|]"],max_length=max_length_tokens,temperature=temperature,top_p=top_p): File "/media/hlt/disk/chenyang_space/chenyang_space/xhd_space/baize-main/demo/app_modules/utils.py", line 253, in greedy_search outputs = model(input_ids) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/peft_model.py", line 575, in forward return self.base_model( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/tuners/lora.py", line 406, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: "addmm_implcpu" not implemented for 'Half'

hecor commented 1 year ago

Got this error too on macbook m1, please help, thanks~

guoday commented 1 year ago

Fix done, please check again.

hecor commented 1 year ago

Great，thanks

hecor commented 1 year ago

But it was very slow to generate reply on macbook m1, nearly 1 word every 1 minute, does any parameters can change this ?

guoday commented 1 year ago

You need to use GPU. It's so slow if you use CPU

hecor commented 1 year ago

got it, thanks~

zay95 commented 1 year ago

Hi, I run demo/app.py on the remote server with 7B mode, with output in the terminal: Reloading javascript... Running on local URL: http://127.0.0.1:7860

but it can't work on local chrom using the url.

guoday commented 1 year ago

Set share=True in app.py and use public URL.

project-baize / baize-chatbot

run app.py error #14