Open lovebeatz opened 2 weeks ago
Do you get any warnings before that?
Also, what version of Torch are you using, what CUDA version, etc.?
here are the steps I followed: create new conda enviornment with python 3.11 installed exllamav2 from wheel
am I missing out on separate installation of torch or cuda? (torch.cuda.empty_cache() works and clears L4 GPU memory, running on linux) because installation goes well, only when you import, it shows 'rich' error and after installing rich, exllamav2 is the next issue lined up also. clone repo method installation fails at pip install . with some error (both whether torch with cuda installed or not)
It's strange, because the installation should fail if it isn't able to install the exllamav2_ext
module, and running without that module installed should prompt it to build it at runtime, and if that fails, then you should get an error from torch.utils.cpp_extension.load
. It seems that in both cases, that latter function is failing silently for some reason.
CUDA toolkit is required for building from source, as is a CUDA-enabled version of PyTorch. But I think there's something up with your env perhaps, since rich
should be a dependency for the wheel and is in requirements.txt. What's your exact PyTorch version? (pip show torch
)
You could try setting verbose = True
at the top of ext.py. That should at least give you output from when it tries to compile the extension.
Will ssh into the server tomorrow, I think if I install from the wheel, then I can't setup verbose, the only change I can make is installing torch with cuda before I install from wheel
here's something I tested, I simply installed pytorch 2.3.1 with cuda 12.1 via conda before ran pip install exllamav2 via whl
rich still needs to be installed separately (error shows up when you run the code not while installation), and no change in behavior regarding exllamav2_ext, so it's clear that torch doesn't need to be installed separately, as this time this didn't make much time to install from whl, without torch, it takes care of torch but takes time to install
code I ran
from exllamav2.generator import ( ExLlamaV2Sampler, )
error I got
NameError Traceback (most recent call last) Cell In[2], line 1 ----> 1 from exllamav2.generator import ( 2 ExLlamaV2Sampler, 3 )
File ~/miniconda3/envs/agentic/lib/python3.11/site-packages/exllamav2/init.py:3 1 from exllamav2.version import version ----> 3 from exllamav2.model import ExLlamaV2 4 from exllamav2.cache import ExLlamaV2CacheBase 5 from exllamav2.cache import ExLlamaV2Cache
File ~/miniconda3/envs/agentic/lib/python3.11/site-packages/exllamav2/model.py:31 28 print("") 30 import math ---> 31 from exllamav2.config import ExLlamaV2Config 32 from exllamav2.cache import ExLlamaV2CacheBase 33 from exllamav2.linear import ExLlamaV2Linear
File ~/miniconda3/envs/agentic/lib/python3.11/site-packages/exllamav2/config.py:5 3 import torch 4 import math ----> 5 from exllamav2.fasttensors import STFile 6 from exllamav2.architecture import ExLlamaV2ArchParams 7 import os, glob, json
File ~/miniconda3/envs/agentic/lib/python3.11/site-packages/exllamav2/fasttensors.py:6 4 import numpy as np 5 import json ----> 6 from exllamav2.ext import exllamav2_ext as ext_c 7 import os 9 def convert_dtype(dt: str):
File ~/miniconda3/envs/agentic/lib/python3.11/site-packages/exllamav2/ext.py:281 278 timer.cancel() 279 end_build_feedback() --> 281 ext_c = exllamav2_ext 284 # Dummy tensor to pass to C++ extension in place of None/NULL 286 none_tensor = torch.empty((1, 1), device = "meta")
NameError: name 'exllamav2_ext' is not defined
But this would imply that all the code before line 281 in ext.py runs. It includes:
build_jit = False
try:
import exllamav2_ext
except ModuleNotFoundError:
build_jit = True
except ImportError as e:
if "undefined symbol" in str(e):
print("\"undefined symbol\" error here usually means you are attempting to load a prebuilt extension wheel "
"that was compiled against a different version of PyTorch than the one you are you using. Please verify "
"that the versions match.")
raise e
Which should either define the exllamav2_ext
symbol, or set build_jit = True
, or raise if there's any other exception besides ModuleNotFoundError
. Then, if build_jit is true, this runs later down:
exllamav2_ext = load \
(
name = extension_name,
sources = sources,
extra_include_paths = [sources_dir],
verbose = verbose,
extra_ldflags = extra_ldflags,
extra_cuda_cflags = extra_cuda_cflags,
extra_cflags = extra_cflags
)
Which once again either raises an exception or defines the exllamav2_ext
symbol. And yet somehow that executes without defining the symbol so you get an error right after:
ext_c = exllamav2_ext
Maybe I've just stared at it for too long, but I'm not seeing any error in the logic there.
As of now, the only way out for me is looking for a previous version where this doesn't happen, not feeling very motivated for this, because I believe you would fix it
Problem is I can't reproduce the error. The only time I've ever seen something like it is when someone has a wrong version of Torch installed for their hardware, but that doesn't seem to be the case here.
I can suggest you maybe try clearing out the extension cache directory at ~/.cache/torch_extensions
, since maybe there are some corrupted build files there. Other than that, if you could provide some more details about your hardware and what library versions you have installed:
nvidia-smi
pip show torch exllamav2
nvcc --version
gcc --version
I tried it on servers, L4 and 3090 I create the test environments, whatever doesn't work, I delete them, on your ask, I refollowed the process with manual pre-installation of pytorch using conda, 2.3.1 and 12.1 cuda
This is after direct wheel install with python 3.9, next I would try with pytorch=2.0.1 and cuda 11.8
Sat Jun 15 03:03:26 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:00:05.0 Off | N/A |
| 0% 27C P8 18W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
Name: exllamav2 Version: 0.1.5+cu117.torch2.0.1 Summary: Home-page: https://github.com/turboderp/exllamav2 Author: turboderp Author-email: License: MIT Location: /home/ubuntu/miniconda3/envs/test/lib/python3.9/site-packages Requires: fastparquet, ninja, numpy, pandas, pygments, regex, safetensors, sentencepiece, torch, websockets Required-by: /bin/bash: line 1: nvcc: command not found gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
For that PyTorch version at least, you'll want this wheel:
pip install -U https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu121.torch2.3.1-cp311-cp311-linux_x86_64.whl
Assuming you're still on Python 3.11
Also, if I clone repo and follow the steps with python 3.11,
pip install .
(exllama) ubuntu@gc-vigilant-exllama:~/libraries/exllamav2$ pip install . Processing /home/ubuntu/libraries/exllamav2 Preparing metadata (setup.py) ... error error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [14 lines of output]
Traceback (most recent call last):
File "
note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed
× Encountered error while generating package metadata. ╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hardware details
Sat Jun 15 03:23:31 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:00:05.0 Off | N/A |
| 0% 27C P8 18W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ [33mWARNING: Package(s) not found: exllamav2[0m[33m [0mName: torch Version: 2.3.1 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /home/ubuntu/miniconda3/envs/exllama/lib/python3.11/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions Required-by: /bin/bash: line 1: nvcc: command not found gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
That error is from not having the CUDA toolkit installed, which is needed for building from source. But try with a cu121 wheel to match your PyTorch version.
For that PyTorch version at least, you'll want this wheel:
pip install -U https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu121.torch2.3.1-cp311-cp311-linux_x86_64.whl
Assuming you're still on Python 3.11
installed this wheel https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu118.torch2.3.1-cp312-cp312-linux_x86_64.whl while on python 3.12
from exllamav2.generator import ExLlamaV2StreamingGenerator, ExLlamaV2Sampler
{ "name": "AttributeError", "message": "module 'torch' has no attribute 'version'", "stack": "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 1 ----> 1 from exllamav2.generator import ExLlamaV2StreamingGenerator, ExLlamaV2Sampler
File ~/miniconda3/envs/exllama/lib/python3.12/site-packages/exllamav2/init.py:3 1 from exllamav2.version import version ----> 3 from exllamav2.model import ExLlamaV2 4 from exllamav2.cache import ExLlamaV2CacheBase 5 from exllamav2.cache import ExLlamaV2Cache
File ~/miniconda3/envs/exllama/lib/python3.12/site-packages/exllamav2/model.py:25 15 # # Set cudaMallocAsync allocator by default as it appears slightly more memory efficient, unless Torch is already 16 # # imported in which case changing the allocator would cause it to crash 17 # if not \"PYTORCH_CUDA_ALLOC_CONF\" in os.environ: (...) 20 # except NameError: 21 # os.environ[\"PYTORCH_CUDA_ALLOC_CONF\"] = \"backend:cudaMallocAsync\" 23 import torch ---> 25 if not (torch.version.cuda or torch.version.hip): 26 print(\"\") 27 print(f\" ## Warning: The installed version of PyTorch is {torch.version} and does not support CUDA or ROCm.\")
AttributeError: module 'torch' has no attribute 'version'" }
Finally, wheel install worked manually installed pytorch 2,3.1 with cuda 12.1 and python 3.11 https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu121.torch2.3.1-cp311-cp311-linux_x86_64.whl
tokenizers, rich require separate pip install
also, what's your say for tabby API? is it regularly updated? what would be the best way to serve exl2 models, via exllamav2 langchain integration or tabbyAPI via openAI langchain?
Also, anything you can tell for using chatml prompt template via exllamav2 want to server hermes-2-pro/hermes-2-theta
Finally, wheel install worked manually installed pytorch 2,3.1 with cuda 12.1 and python 3.11 https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu121.torch2.3.1-cp311-cp311-linux_x86_64.whl
tokenizers, rich require separate pip install
also, what's your say for tabby API? is it regularly updated? what would be the best way to serve exl2 models, via exllamav2 langchain integration or tabbyAPI via openAI langchain?
so if anyone wants to go directly with wheel install without manual torch install, the workaround is here so whatever wheel one picks, only a certain torch/cuda version is install by default, so if one picks the wheel which matches the version the default torch install, it shows no error recently one that worked is, with python 3.11 and no manual torch install https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu121.torch2.3.1-cp311-cp311-linux_x86_64.whl
also, how to use flash attention?
Just install it, and it will be used by default. pip install flash-attn
what's your say for tabby API? is it regularly updated? what would be the best way to serve exl2 models, via exllamav2 langchain integration or tabbyAPI via openAI langchain? & anything you can tell for using chatml prompt template via exllamav2 want to server hermes-2-pro/hermes-2-theta
Tabby is still alive and well, getting frequent updates. I don't really have an opinion on LangChain as I've never used it (or found much use for it) but Tabby provides an OAI-compatible endpoint so you can use it with whatever frontend or framework supports that.
https://github.com/turboderp/exllamav2/blob/5996922a0f0937aa503efa773780f1648915d73e/exllamav2/ext.py#L281
Error is name 'exllamav2_ext' is not defined
install method is build from source, from the cp311 whl provided, cp310 also tried in a different conda envirionment
didn't try any old version, looking for a reliable way to serve LLMs
also, 'rich' needs to be installed separately, as of now, shall be included in the whl