Open Flucc opened 1 year ago
i got the same issue, have you find a way to fix it ?
i was able to run the web ui but i got new issue:
load_llama_model_4bit_low_ram
function located at autograd_4bit.py
half=False
to half=True
:
+ def load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1, half=True, device_map="auto", seqlen=2048):
- def load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1, half=False, device_map="auto", seqlen=2048):
with this i was able to run the web ui but when i send a request i get the following error back:
Traceback (most recent call last):
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/blocks.py", line 1108, in process_api
result = await self.call_function(
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/blocks.py", line 929, in call_function
prediction = await anyio.to_thread.run_sync(
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/utils.py", line 490, in async_iteration
return next(iterator)
File "/alpaca_lora_4bit/text-generation-webui/modules/text_generation.py", line 119, in generate_reply
if any((shared.is_RWKV, shared.is_llamacpp)):
AttributeError: module 'modules.shared' has no attribute 'is_llamacpp'
i tried running the inference.py
and i got this back:
Converted as Half.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loaded the model in 2.17 seconds.
Fitting 4bit scales and zeros to half
Apply AMP Wrapper ...
I think the meaning of life is to live a full and rewarding life, and to leave a positive impact on the world after you
1.1310465335845947
Having the same issue as Flucc
Maybe I should revert the change of setting bias as parameter.
Also try this in custom monkey patch
print('Apply auto switch and half')
model.half()
for n, m in model.named_modules():
if isinstance(m, Autograd4bitQuantLinear) or isinstance(m, Linear4bitLt):
if m.groupsize == -1:
m.zeros = m.zeros.half()
m.scales = m.scales.half()
print('Apply AMP Wrapper ...')
from amp_wrapper import AMPWrapper
wrapper = AMPWrapper(model)
wrapper.apply_generate()
And I think "AttributeError: module 'modules.shared' has no attribute 'is_llamacpp'" this is an error from text generation webui?
I was following the guide written posted for installing this along with the text-generation-webui. I have now run into an issue when I run it.