AttributeError: 'LlamaGPTQForCausalLM' object has no attribute 'dtype' (minigpt4-13b)

xjdeng commented 1 year ago

Describe the bug

I just started having this issue yesterday when I was running minigpt4-13b using TheBloke_Wizard-Vicuna-13B-Uncensored-GPTQ, specifically when I ask it to describe an image

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Fire up a free Colab instance and run the following:

import time
t0 = time.time()
!git clone https://github.com/oobabooga/text-generation-webui
%cd text-generation-webui
!pip install -r requirements.txt
!git clone https://github.com/Wojtab/minigpt-4-pipeline extensions/multimodal/pipelines/minigpt-4-pipeline
!pip install -r extensions/multimodal/pipelines/minigpt-4-pipeline/requirements.txt
!python download-model.py TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ
!git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
%cd GPTQ-for-LLaMa
!python setup_cuda.py install
%cd ..
!cp -r GPTQ-for-LLaMa/* .
print(time.time() - t0)
!python server.py --extensions multimodal --multimodal-pipeline minigpt4-13b --share --chat --model TheBloke_Wizard-Vicuna-13B-Uncensored-GPTQ --wbits 4 --groupsize 128 --auto-devices

Then load the Gradio link, drag an image into the multimodal section, and put a prompt like "Describe the image" and hit Generate. It'll be stuck and if you go back to your Colab notebook, you'll get an error.

Screenshot

No response

Logs

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 414, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1067, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 339, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 332, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 315, in run_sync_iterator_async
    return next(iterator)
  File "/content/text-generation-webui/modules/chat.py", line 336, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, shared.history, state, regenerate, _continue, loading_message=True)):
  File "/content/text-generation-webui/modules/chat.py", line 321, in generate_chat_reply
    for history in chatbot_wrapper(text, history, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
  File "/content/text-generation-webui/modules/chat.py", line 238, in chatbot_wrapper
    for j, reply in enumerate(generate_reply(prompt + cumulative_reply, state, eos_token=eos_token, stopping_strings=stopping_strings, is_chat=True)):
  File "/content/text-generation-webui/modules/text_generation.py", line 24, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "/content/text-generation-webui/modules/text_generation.py", line 194, in _generate_reply
    for reply in generate_func(question, original_question, seed, state, eos_token, stopping_strings, is_chat=is_chat):
  File "/content/text-generation-webui/modules/text_generation.py", line 243, in generate_reply_HF
    question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
  File "/content/text-generation-webui/modules/extensions.py", line 193, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
  File "/content/text-generation-webui/modules/extensions.py", line 107, in _apply_tokenizer_extensions
    return getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
  File "/content/text-generation-webui/extensions/multimodal/script.py", line 80, in tokenizer_modifier
    prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(prompt, state, params)
  File "/content/text-generation-webui/extensions/multimodal/multimodal_embedder.py", line 172, in forward
    prompt_parts = self._embed(prompt_parts)
  File "/content/text-generation-webui/extensions/multimodal/multimodal_embedder.py", line 148, in _embed
    embedded = self.pipeline.embed_images([parts[i].image for i in image_indicies])
  File "/content/text-generation-webui/extensions/multimodal/pipelines/minigpt-4-pipeline/minigpt4_pipeline.py", line 46, in embed_images
    return image_emb.to(shared.model.device, dtype=shared.model.dtype)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlamaGPTQForCausalLM' object has no attribute 'dtype'

System Info

Free Colab instance

oobabooga commented 1 year ago

Apparently multimodal didn't like AutoGPTQ

rlederer-C136 commented 1 year ago

I also experience this same exact error on a home based setup with a 3090 24GB card. Worked great before i did a update_linux two days ago. Now multimodal is broken. Worked really well before with almost all Llama based models.

yunho-c commented 1 year ago

Does anyone have a temporary fix to this error (maybe install an older branch)?

rlederer-C136 commented 1 year ago

ATM, i wiped my one-click folder, started fresh. I am on Ubuntu. I then cloned the one-click git repo again, and then edited the webui.py command args to include "--gptq-for-llama" before running start_linux.sh. Then run yer start script and see how that goes. It finally all went back together right last night for me, now streaming speed is good again and multimodal works again as well. There is some issue with AutoGPTQ, switch back to GPTQ-for-llama for now.

oobabooga commented 1 year ago

The issue seems to be fixed after this commit: https://github.com/oobabooga/text-generation-webui/commit/e471919e6d504e85ac1aa58ad6bf0d46d0d9323d

xjdeng commented 1 year ago

Appreciate the effort but after checking out the latest commit, it seems to be working but the text generation is REALLY slow. Thought it was an issue with my colab but I tried to revert to 19f7868 and it was back to its usual speed again.

oobabooga commented 1 year ago

I get the same speed for minigpt4-13b with AutoGPTQ or GPTQ-for-LLaMa. It is possible that the AutoGPTQ wheel is not optimized for the Colab GPU or environment. A workaround for now is to use --gptq-for-llama (or check gptq-for-llama in the UI before loading the model).

xjdeng commented 1 year ago

That fixed the speed issue!

oobabooga / text-generation-webui