oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.42k stars 5.3k forks source link

Multimodal LLAVA 13B does not work #4378

Closed EugeoSynthesisThirtyTwo closed 10 months ago

EugeoSynthesisThirtyTwo commented 1 year ago

Describe the bug

LLAVA can write text but it raises an error when trying to read an image

Is there an existing issue for this?

Reproduction

Screenshot

erreur ui

Logs

(C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env) C:\Users\Armaguedin\Documents\dev\python\text-generation-webui>python server.py --model TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True --multimodal-pipeline llava-v1.5-13b
bin C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
2023-10-24 20:28:53 INFO:Loading TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True...
2023-10-24 20:28:58 INFO:Loaded the model in 4.30 seconds.
2023-10-24 20:28:58 INFO:Loading the extension "multimodal"...
2023-10-24 20:28:58 INFO:Loading the extension "gallery"...
C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: llama or set allow_custom_value=True.
  warnings.warn(
2023-10-24 20:28:58 INFO:LLaVA - Loading CLIP from openai/clip-vit-large-patch14-336 as torch.float32 on cuda:0...
2023-10-24 20:29:00 INFO:LLaVA - Loading projector from liuhaotian/llava-v1.5-13b as torch.float32 on cuda:0...
2023-10-24 20:29:00 INFO:LLaVA supporting models loaded, took 2.71 seconds
2023-10-24 20:29:00 INFO:Multimodal: loaded pipeline llava-v1.5-13b from pipelines/llava (LLaVA_v1_5_13B_Pipeline)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
2023-10-24 20:29:11 INFO:Deleted logs\chat\Assistant\20231024-18-58-37.json.
Output generated in 2.47 seconds (6.89 tokens/s, 17 tokens, context 67, seed 1371373692)
2023-10-24 20:29:27 INFO:Deleted logs\chat\Assistant\20231024-20-29-11.json.
Output generated in 1.79 seconds (5.60 tokens/s, 10 tokens, context 127, seed 981241969)
Traceback (most recent call last):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1199, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 649, in gen_wrapper
    yield from f(*args, **kwargs)
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 329, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True)):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 297, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\chat.py", line 237, in chatbot_wrapper
    for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True)):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 30, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 77, in _generate_reply
    for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\text_generation.py", line 309, in generate_reply_HF
    question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\extensions.py", line 224, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\modules\extensions.py", line 127, in _apply_tokenizer_extensions
    prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\script.py", line 89, in tokenizer_modifier
    prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\multimodal_embedder.py", line 172, in forward
    prompt_parts = self._embed(prompt_parts)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\multimodal_embedder.py", line 154, in _embed
    parts[i].embedding = self.pipeline.embed_tokens(part.input_ids)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Armaguedin\Documents\dev\python\text-generation-webui\extensions\multimodal\pipelines\llava\llava.py", line 91, in embed_tokens
    raise ValueError('The embed_tokens method has not been found for this loader.')
ValueError: The embed_tokens method has not been found for this loader.

System Info

Windows 10
RTX 3080 Ti 16GB VRAM
Govindai1 commented 1 year ago

image

running it through session is also troughing error. the way you told if I run I am having error I wonder how its working on your system image

EugeoSynthesisThirtyTwo commented 1 year ago

running it through session is also troughing error. the way you told if I run I am having error I wonder how its working on

I don't know how it works to load the model on my system :/ Did you somehow manage to run it using a different method ? (other command line args for instance)

yhyu13 commented 1 year ago

@oobabooga I had the same issue, when not specifiying loader as autogptq, the embed_token method cannot be found. There must be a bug for the default loader for gptq llava v1.5 models

@EugeoSynthesisThirtyTwo You need to for specifiy autogptq and disable exllama for the time being

MODEL=llava-v1.5-13B-GPTQ
python server.py --model $MODEL \
    --loader autogptq \
    --disable_exllama \
    --multimodal-pipeline llava-v1.5-13b
EugeoSynthesisThirtyTwo commented 1 year ago

@yhyu13 Thank you, I got rid of this error but unfortunately I have another error now I create a new bug report here https://github.com/oobabooga/text-generation-webui/issues/4398 because I don't know if it's related

EugeoSynthesisThirtyTwo commented 1 year ago

Nevermind, I made a mistake ! Now it works thank you ! I guess I should leave this thread open since there is a bug when I follow the guide strictly

Govindai1 commented 1 year ago

C:\AI\text-generation-webui>python server.py --model TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True --multimodal-pipeline llava-v1.5-13b --disable_exllama --loader autogptq bin C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " function 'cadam32bit_grad_fp32' not found 2023-10-28 00:13:44 INFO:Loading settings from settings.yaml... 2023-10-28 00:13:44 INFO:Loading TheBloke_llava-v1.5-13B-GPTQ_gptq-4bit-32g-actorder_True... 2023-10-28 00:13:44 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': True} 2023-10-28 00:13:44 WARNING:CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:

  1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
  2. You are using pytorch without CUDA support.
  3. CUDA and nvcc are not installed in your device. 2023-10-28 00:13:44 WARNING:CUDA extension not installed. Traceback (most recent call last): File "C:\AI\text-generation-webui\server.py", line 223, in shared.model, shared.tokenizer = load_model(model_name) File "C:\AI\text-generation-webui\modules\models.py", line 84, in load_model output = load_func_maploader File "C:\AI\text-generation-webui\modules\models.py", line 330, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File "C:\AI\text-generation-webui\modules\AutoGPTQ_loader.py", line 58, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\auto_gptq\modeling\auto.py", line 108, in from_quantized return quant_func( File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\auto_gptq\modeling_base.py", line 875, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modeling.py", line 1357, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modeling.py", line 1186, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\safetensors\torch.py", line 311, in load_file result[k] = f.get_tensor(k) File "C:\Users\Govind\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda__init__.py", line 289, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

I am having this error anyone can help?

yhyu13 commented 1 year ago

@Govindai1

Are you on a cuda device? Then you need to install pytorch using apporaiate cuda version. Checkout pytorch officla website download section.

Correct me if I am wrong, textgen mostly support torch 2.0.1 cuda118

Reminic commented 1 year ago

In case somebody else has the same issue : I had the "AttributeError: 'NoneType' object has no attribute 'lower'" message on my Windows 11 PC, it finally went away when I used the file CMD_FLAGS.txt to set the command line options, rather than using the environment variable OOBABOOGA_FLAGS.

github-actions[bot] commented 10 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.