johnsmith0031 / alpaca_lora_4bit

MIT License
534 stars 84 forks source link

TypeError: cannot assign 'torch.cuda.HalfTensor' as parameter 'bias' (torch.nn.Parameter or None expected) #49

Open Flucc opened 1 year ago

Flucc commented 1 year ago

I was following the guide written posted for installing this along with the text-generation-webui. I have now run into an issue when I run it.


Loading ../llama-13b-4bit.pt ...
Loading Model ...
Loaded the model in 32.67 seconds.
../alpaca13b_lora/ Lora Applied.
Apply auto switch and half
Traceback (most recent call last):
  File "/home/administrator/alpaca_lora_4bit/text-generation-webui/server.py", line 276, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/administrator/alpaca_lora_4bit/text-generation-webui/custom_monkey_patch.py", line 28, in load_model_llama
    m.bias = m.bias.half()
  File "/home/administrator/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1635, in __setattr__
    raise TypeError("cannot assign '{}' as parameter '{}' "
TypeError: cannot assign 'torch.cuda.HalfTensor' as parameter 'bias' (torch.nn.Parameter or None expected)```

I have tried reinstalling packages but I'm unsure of how to continue without breaking anything else.
YrKpFk commented 1 year ago

i got the same issue, have you find a way to fix it ?

YrKpFk commented 1 year ago

i was able to run the web ui but i got new issue:

  1. i go to the load_llama_model_4bit_low_ram function located at autograd_4bit.py
  2. i changed the half=False to half=True:
    + def load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1, half=True, device_map="auto", seqlen=2048):
    - def load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1, half=False, device_map="auto", seqlen=2048):

    with this i was able to run the web ui but when i send a request i get the following error back:

    Traceback (most recent call last):
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/blocks.py", line 929, in call_function
    prediction = await anyio.to_thread.run_sync(
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
    File "/alpaca_lora_4bit/env/lib/python3.10/site-packages/gradio/utils.py", line 490, in async_iteration
    return next(iterator)
    File "/alpaca_lora_4bit/text-generation-webui/modules/text_generation.py", line 119, in generate_reply
    if any((shared.is_RWKV, shared.is_llamacpp)):
    AttributeError: module 'modules.shared' has no attribute 'is_llamacpp'

i tried running the inference.py and i got this back:

Converted as Half.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Loaded the model in 2.17 seconds.
Fitting 4bit scales and zeros to half
Apply AMP Wrapper ...
 I think the meaning of life is to live a full and rewarding life, and to leave a positive impact on the world after you
1.1310465335845947
maxdeleon commented 1 year ago

Having the same issue as Flucc

johnsmith0031 commented 1 year ago

Maybe I should revert the change of setting bias as parameter.

Also try this in custom monkey patch

print('Apply auto switch and half')
model.half()
for n, m in model.named_modules():
    if isinstance(m, Autograd4bitQuantLinear) or isinstance(m, Linear4bitLt):
        if m.groupsize == -1:
            m.zeros = m.zeros.half()
        m.scales = m.scales.half()

print('Apply AMP Wrapper ...')
from amp_wrapper import AMPWrapper
wrapper = AMPWrapper(model)
wrapper.apply_generate()

And I think "AttributeError: module 'modules.shared' has no attribute 'is_llamacpp'" this is an error from text generation webui?