Closed masbicudo closed 5 months ago
What I have found so far is an import
that is failing in the quant.py
file:
import quant_cuda_faster as quant_cuda
I then created a test file containing the import
to debug.
First, I got the error:
Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified module could not be found.
Then I found out that to load a module DLL, the file needs to be in the trusted DLL list.
I proceeded to add env\Lib\site-packages\torch\lib
in the list since I found out that the code of quant_cuda_faster depends on torch.
import os
os.add_dll_directory(r"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib")
import quant_cuda_faster as quant_cuda
The error now changes to:
Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified procedure could not be found.
I have no idea on how to fix this, since the error does not give any indication of which procedure it might be missing.
I can only suppose that it is a torch version conflict, where the quant_cuda_faster
expects a version of torch that is not the same one that is installed.
I have the same issue here
I executed python -m torch.utils.collect_env
in the context of cmd_windows.bat
, with the result bellow.
It tells that CUDA is installed.
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 551.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2304
DeviceID=CPU0
Family=198
L2CacheSize=10240
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2304
Name=11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.1+cu121
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.17.1+cu121
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.2.1+cu121 pypi_0 pypi
[conda] torchaudio 2.2.1+cu121 pypi_0 pypi
[conda] torchvision 0.17.1+cu121 pypi_0 pypi
Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.
Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.
@HamedEmine Yeah, I could also load this model using ExLlamav2_HF
. Thanks for pointing me to this solution. It can answer text questions, but unfortunately it raised an error when I tried to send it a picture. My intention is to use the multimodal extension. I was following the instructions of the multimodal extension page, that is why I was trying to use the wojtab_llava-13b-v0-4bit-128g
model.
This is the error when I try to input images (AttributeError: 'Exllamav2HF' object has no attribute 'model'):
Traceback (most recent call last):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1684, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1262, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 574, in async_iteration
return await iterator.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 567, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 550, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 733, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 414, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 382, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 325, in chatbot_wrapper
for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 33, in generate_reply
for result in _generate_reply(*args, **kwargs):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 85, in _generate_reply
for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 324, in generate_reply_HF
question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 231, in apply_extensions
return EXTENSION_MAP[typ](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 134, in _apply_tokenizer_extensions
prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\script.py", line 90, in tokenizer_modifier
prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(prompt, state, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 172, in forward
prompt_parts = self._embed(prompt_parts)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 154, in _embed
parts[i].embedding = self.pipeline.embed_tokens(part.input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\pipelines\instructblip-pipeline\instructblip_pipeline.py", line 42, in embed_tokens
return shared.model.model.embed_tokens(input_ids).to(shared.model.device, dtype=shared.model.dtype)
^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Exllamav2HF' object has no attribute 'model'
I was able to load the model and use multimodal extension using the ExLlamav2
loader.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
I am trying to use the multimodal model
wojtab_llava-13b-v0-4bit-128g
on Windows using CUDA. (further developments of this issue in my comments bellow)Is there an existing issue for this?
Reproduction
I used the installer
start_windows.bat
. Then I restart with option--multimodal-pipeline llava-13b
. I downloaded the modelwojtab/llava-13b-v0-4bit-128g
. Then loaded it withGPTQ-for-LLaMa
using:After clicking the Load button, it loads as if everything is ok, but the AI does not give any responses.
I noted, that in the console output, there is the message:
CUDA extension not installed
twice. Also, sending requests to the AI result in an error:NameError: name 'quant_cuda' is not defined
.Screenshot
No response
Logs
System Info