Message "CUDA extension not installed" but CUDA 12 is installed on Windows

masbicudo commented 7 months ago

Describe the bug

I am trying to use the multimodal model wojtab_llava-13b-v0-4bit-128g on Windows using CUDA. (further developments of this issue in my comments bellow)

Is there an existing issue for this?

[X] I have searched the existing issues
https://github.com/oobabooga/text-generation-webui/issues/1289: this one is similar but the solutions are not working

Reproduction

I used the installer start_windows.bat. Then I restart with option --multimodal-pipeline llava-13b. I downloaded the model wojtab/llava-13b-v0-4bit-128g. Then loaded it with GPTQ-for-LLaMa using:

wbits=4
groupsize=128
model_type=llama

After clicking the Load button, it loads as if everything is ok, but the AI does not give any responses.

I noted, that in the console output, there is the message: CUDA extension not installed twice. Also, sending requests to the AI result in an error: NameError: name 'quant_cuda' is not defined.

Screenshot

No response

Logs

23:12:42-780091 INFO     Starting Text generation web UI
23:12:42-783600 INFO     Loading the extension "multimodal"
23:12:42-787386 INFO     Loading the extension "gallery"
23:12:42-999049 INFO     LLaVA - Loading CLIP from openai/clip-vit-large-patch14 as torch.float32 on cuda:0...
23:12:44-963047 INFO     LLaVA - Loading projector from liuhaotian/LLaVA-13b-delta-v0 as torch.float32 on cuda:0...
23:12:45-158792 INFO     LLaVA supporting models loaded, took 2.16 seconds
23:12:45-160295 INFO     Multimodal: loaded pipeline llava-13b from pipelines/llava (LLaVA_v0_13B_Pipeline)

Running on local URL:  http://127.0.0.1:7860

23:13:10-235869 INFO     Loading "wojtab_llava-13b-v0-4bit-128g"
CUDA extension not installed.
CUDA extension not installed.
23:13:10-264514 INFO     Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
23:13:21-655309 INFO     LOADER: "GPTQ-for-LLaMa"
23:13:21-658307 INFO     TRUNCATION LENGTH: 2048
23:13:21-659305 INFO     INSTRUCTION TEMPLATE: "LLaVA"
23:13:21-660305 INFO     Loaded the model in 11.42 seconds.
Traceback (most recent call last):
  File "C:\Tools\text-generation-webui-oobabooga\modules\callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 397, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 1592, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 2696, in sample
    outputs = self(
              ^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1176, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1019, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 740, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 639, in forward
    query_states = self.q_proj(hidden_states)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gptq_for_llama\gptq_old\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
    ^^^^^^^^^^
NameError: name 'quant_cuda' is not defined
Output generated in 0.59 seconds (0.00 tokens/s, 0 tokens, context 71, seed 226787340)

System Info

GPU: RTX 3060, 6GB VRAM
CPU: i7 11th gen, 64GB RAM
OS: Windows 11
CUDA: 12

masbicudo commented 7 months ago

What I have found so far is an import that is failing in the quant.py file:

import quant_cuda_faster as quant_cuda

I then created a test file containing the import to debug. First, I got the error:

Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified module could not be found.

Then I found out that to load a module DLL, the file needs to be in the trusted DLL list. I proceeded to add env\Lib\site-packages\torch\lib in the list since I found out that the code of quant_cuda_faster depends on torch.

import os
os.add_dll_directory(r"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib")
import quant_cuda_faster as quant_cuda

The error now changes to:

Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified procedure could not be found.

I have no idea on how to fix this, since the error does not give any indication of which procedure it might be missing. I can only suppose that it is a torch version conflict, where the quant_cuda_faster expects a version of torch that is not the same one that is installed.

HamedEmine commented 7 months ago

I have the same issue here

masbicudo commented 7 months ago

I executed python -m torch.utils.collect_env in the context of cmd_windows.bat, with the result bellow. It tells that CUDA is installed.

PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 551.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2304
DeviceID=CPU0
Family=198
L2CacheSize=10240
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2304
Name=11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.1+cu121
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.17.1+cu121
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.2.1+cu121              pypi_0    pypi
[conda] torchaudio                2.2.1+cu121              pypi_0    pypi
[conda] torchvision               0.17.1+cu121             pypi_0    pypi

HamedEmine commented 7 months ago

Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.

masbicudo commented 7 months ago

Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.

@HamedEmine Yeah, I could also load this model using ExLlamav2_HF. Thanks for pointing me to this solution. It can answer text questions, but unfortunately it raised an error when I tried to send it a picture. My intention is to use the multimodal extension. I was following the instructions of the multimodal extension page, that is why I was trying to use the wojtab_llava-13b-v0-4bit-128g model.

This is the error when I try to input images (AttributeError: 'Exllamav2HF' object has no attribute 'model'):

Traceback (most recent call last):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1684, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1262, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 574, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 567, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 550, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 733, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 414, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 382, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 325, in chatbot_wrapper
    for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 33, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 85, in _generate_reply
    for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 324, in generate_reply_HF
    question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 231, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 134, in _apply_tokenizer_extensions
    prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\script.py", line 90, in tokenizer_modifier
    prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(prompt, state, params)
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 172, in forward
    prompt_parts = self._embed(prompt_parts)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 154, in _embed
    parts[i].embedding = self.pipeline.embed_tokens(part.input_ids)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\pipelines\instructblip-pipeline\instructblip_pipeline.py", line 42, in embed_tokens
    return shared.model.model.embed_tokens(input_ids).to(shared.model.device, dtype=shared.model.dtype)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Exllamav2HF' object has no attribute 'model'

masbicudo commented 7 months ago

I was able to load the model and use multimodal extension using the ExLlamav2 loader.

github-actions[bot] commented 5 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui