Multimodal LLAVA 13B does not work

EugeoSynthesisThirtyTwo commented 1 year ago

Describe the bug

LLAVA can write text but it raises an error when trying to read an image

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Do a fresh installation with the one-click installer (Select NVIDIA GPU and cuda 12.1)
Start the server
Go to model > download model or lora
Download the following model TheBloke/llava-v1.5-13B-GPTQ:gptq-4bit-32g-actorder_True
Restart the server with the following command python server.py --model TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True --multimodal-pipeline llava-v1.5-13b
Send an image to the bot

Screenshot

erreur ui

Logs

(C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env) C:\Users\Armaguedin\Documents\dev\python\text-generation-webui>python server.py --model TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True --multimodal-pipeline llava-v1.5-13b
bin C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
2023-10-24 20:28:53 INFO:Loading TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True...
2023-10-24 20:28:58 INFO:Loaded the model in 4.30 seconds.
2023-10-24 20:28:58 INFO:Loading the extension "multimodal"...
2023-10-24 20:28:58 INFO:Loading the extension "gallery"...
C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: llama or set allow_custom_value=True.
  warnings.warn(
2023-10-24 20:28:58 INFO:LLaVA - Loading CLIP from openai/clip-vit-large-patch14-336 as torch.float32 on cuda:0...
2023-10-24 20:29:00 INFO:LLaVA - Loading projector from liuhaotian/llava-v1.5-13b as torch.float32 on cuda:0...
2023-10-24 20:29:00 INFO:LLaVA supporting models loaded, took 2.71 seconds
2023-10-24 20:29:00 INFO:Multimodal: loaded pipeline llava-v1.5-13b from pipelines/llava (LLaVA_v1_5_13B_Pipeline)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
2023-10-24 20:29:11 INFO:Deleted logs\chat\Assistant\20231024-18-58-37.json.
Output generated in 2.47 seconds (6.89 tokens/s, 17 tokens, context 67, seed 1371373692)
2023-10-24 20:29:27 INFO:Deleted logs\chat\Assistant\20231024-20-29-11.json.
Output generated in 1.79 seconds (5.60 tokens/s, 10 tokens, context 127, seed 981241969)
Traceback (most recent call last):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1199, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 649, in gen_wrapper
    yield from f(*args, **kwargs)
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 329, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True)):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 297, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 237, in chatbot_wrapper
    for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True)):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 30, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 77, in _generate_reply
    for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 309, in generate_reply_HF
    question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\extensions.py", line 224, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\extensions.py", line 127, in _apply_tokenizer_extensions
    prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\script.py", line 89, in tokenizer_modifier
    prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\multimodal_embedder.py", line 172, in forward
    prompt_parts = self._embed(prompt_parts)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\multimodal_embedder.py", line 154, in _embed
    parts[i].embedding = self.pipeline.embed_tokens(part.input_ids)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\pipelines\llava\llava.py", line 91, in embed_tokens
    raise ValueError('The embed_tokens method has not been found for this loader.')
ValueError: The embed_tokens method has not been found for this loader.

System Info

Windows 10
RTX 3080 Ti 16GB VRAM

Govindai1 commented 1 year ago

running it through session is also troughing error. the way you told if I run I am having error I wonder how its working on your system

EugeoSynthesisThirtyTwo commented 1 year ago

running it through session is also troughing error. the way you told if I run I am having error I wonder how its working on

I don't know how it works to load the model on my system :/ Did you somehow manage to run it using a different method ? (other command line args for instance)

yhyu13 commented 1 year ago

@oobabooga I had the same issue, when not specifiying loader as autogptq, the embed_token method cannot be found. There must be a bug for the default loader for gptq llava v1.5 models

@EugeoSynthesisThirtyTwo You need to for specifiy autogptq and disable exllama for the time being

MODEL=llava-v1.5-13B-GPTQ
python server.py --model $MODEL \
    --loader autogptq \
    --disable_exllama \
    --multimodal-pipeline llava-v1.5-13b

EugeoSynthesisThirtyTwo commented 1 year ago

@yhyu13 Thank you, I got rid of this error but unfortunately I have another error now I create a new bug report here https://github.com/oobabooga/text-generation-webui/issues/4398 because I don't know if it's related

EugeoSynthesisThirtyTwo commented 1 year ago

Nevermind, I made a mistake ! Now it works thank you ! I guess I should leave this thread open since there is a bug when I follow the guide strictly

Govindai1 commented 1 year ago

C:\AI\text-generation-webui>python server.py --model TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True --multimodal-pipeline llava-v1.5-13b --disable_exllama --loader autogptq bin C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " function 'cadam32bit_grad_fp32' not found 2023-10-28 00:13:44 INFO:Loading settings from settings.yaml... 2023-10-28 00:13:44 INFO:Loading TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True... 2023-10-28 00:13:44 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': True} 2023-10-28 00:13:44 WARNING:CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:

You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
You are using pytorch without CUDA support.
CUDA and nvcc are not installed in your device. 2023-10-28 00:13:44 WARNING:CUDA extension not installed. Traceback (most recent call last): File "C:\AI\text-generation-webui\server.py", line 223, in shared.model, shared.tokenizer = load_model(model_name) File "C:\AI\text-generation-webui\modules\models.py", line 84, in load_model output = load_func_maploader File "C:\AI\text-generation-webui\modules\models.py", line 330, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File "C:\AI\text-generation-webui\modules\AutoGPTQ_loader.py", line 58, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\auto_gptq\modeling\auto.py", line 108, in from_quantized return quant_func( File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\auto_gptq\modeling_base.py", line 875, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modeling.py", line 1357, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modeling.py", line 1186, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\safetensors\torch.py", line 311, in load_file result[k] = f.get_tensor(k) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda__init__.py", line 289, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

I am having this error anyone can help?

yhyu13 commented 1 year ago

@Govindai1

Are you on a cuda device? Then you need to install pytorch using apporaiate cuda version. Checkout pytorch officla website download section.

Correct me if I am wrong, textgen mostly support torch 2.0.1 cuda118

Reminic commented 1 year ago

In case somebody else has the same issue : I had the "AttributeError: 'NoneType' object has no attribute 'lower'" message on my Windows 11 PC, it finally went away when I used the file CMD_FLAGS.txt to set the command line options, rather than using the environment variable OOBABOOGA_FLAGS.

github-actions[bot] commented 10 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui