Open ewebgh33 opened 11 months ago
look at the torch documentation it says 12.1 :)
get the right torch package here -> https://pytorch.org/get-started/locally/
Thanks I didn't see it in the documentation, just looked at the main github page and there is no link to docs. Maybe a line under "Running locally" on the main repo page, "You will need CUDA 12.1 and etc etc, then git clone etc.". :)
Will 12.3 work as well? Doing a system update and it seems like I should get on the latest, other LLM apps need 12.2 and I hope 12.3 will at least get me through the next few months!
no i don’t think so. maybe you can customise torch trough some configs but out of the box (as far as i know) thats not possible
Alright so I uninstalled CUDA and installed 12.1. Check win environment variable, all seems OK. Delete conda env to start fresh. Set up new one. Pip install requirements. Run server.py.
Error:
(exui) C:\AI\Text\exui>python server.py
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1'
Traceback (most recent call last):
File "C:\AI\Text\exui\server.py", line 11, in <module>
from backend.models import update_model, load_models, get_model_info, list_models, remove_model, load_model, unload_model, get_loaded_model
File "C:\AI\Text\exui\backend\models.py", line 5, in <module>
from exllamav2 import(
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\__init__.py", line 3, in <module>
from exllamav2.model import ExLlamaV2
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\model.py", line 17, in <module>
from exllamav2.cache import ExLlamaV2CacheBase
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\cache.py", line 2, in <module>
from exllamav2.ext import exllamav2_ext as ext_c
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\ext.py", line 131, in <module>
exllamav2_ext = load \
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1308, in load
return _jit_compile(
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1810, in _write_ninja_file_and_build_library
_write_ninja_file_to_build_library(
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2199, in _write_ninja_file_to_build_library
cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
File "C:\Users\ComputeyName\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1980, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
Why would it not find it when it's exactly where it says it's looking? Because:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
It's there? Driver also supports it, I did check that too.
The failure seems to happen in PyTorch, at this point:
arch_list = []
# the assumption is that the extension should run on any of the currently visible cards,
# which could be of different types - therefore all archs for visible cards should be included
for i in range(torch.cuda.device_count()):
capability = torch.cuda.get_device_capability(i)
supported_sm = [int(arch.split('_')[1])
for arch in torch.cuda.get_arch_list() if 'sm_' in arch]
max_supported_sm = max((sm // 10, sm % 10) for sm in supported_sm)
# Capability of the device may be higher than what's supported by the user's
# NVCC, causing compilation error. User's NVCC is expected to match the one
# used to build pytorch, so we use the maximum supported capability of pytorch
# to clamp the capability.
capability = min(max_supported_sm, capability)
arch = f'{capability[0]}.{capability[1]}'
if arch not in arch_list:
arch_list.append(arch)
arch_list = sorted(arch_list)
arch_list[-1] += '+PTX'
It fails on the last line indexing the last element of arch_list
, which means that list is empty. The only way I can see that happening is if torch.cuda.device_count()
is zero, i.e. Torch has not recognized any CUDA devices.
Are you sure you have the CUDA-enabled version of Torch installed? pip freeze
should show torch==...+cu121
I believe. The "No CUDA runtime is found" error is also emitted by Torch, so it does look like you have the CUDA version, but it really can't find the CUDA runtime, which would be provided by the NVIDIA driver. I don't know if maybe that's not installed, or not available somehow? Do you get anything from running nvidia-smi
?
for "pip show torch":
Name: torch
Version: 2.1.2
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: c:\users\hesperos\appdata\local\programs\python\python310\lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: exllamav2
pip freeze shows
torch==2.1.2
or in anaconda it shows
torch==2.1.0+cu121
but then once the environment is active anaconda also shows
torch==2.1.2
nvidia-smi shows
NVIDIA-SMI 546.12 Driver Version: 546.12 CUDA Version: 12.3
And the 2x 4090s.
of course, the 12.3 there just means that the driver is compatible up to that version of CUDA as you know.
So that's the weird thing, torch is there, CUDA is there, etc. Or is it? I need a torch install inside the env. But it wasn't in the requirements... it's a requirement though? Hm Now after installing torch I get "No module named 'flask'" which I suppose means I am supposed to have an environment variable for it. But if I do >flask --version I get
Python 3.10.8
Flask 3.0.0
Werkzeug 3.0.1
So I don't know. I have it or not.
You definitely have the non-CUDA version of Torch. Why 2.1.0+cu121 shows up in Anaconda I don't know. In any case, I would do:
pip uninstall torch
pip install torch --index-url https://download.pytorch.org/whl/cu121
There's definitely something up with your conda envs though. flask
is a requirement and shouldn't need any environment variables. You should be able to install it with:
pip install torch waitress
(waitress
being the next requirement it would likely not find if it isn't finding flask
.)
Had same issue today, installed from anaconda :
conda create -n exui python==3.10
conda activate exui
pip install -r requirements.txt
And received same error. To fix that:
pip uninstall torch # and generated right version of pytorch on their site
pip install torch --index-url https://download.pytorch.org/whl/cu121
Now everything is working. I guess it would be nice to add some note in installation guide for conda users.
I had this issue today, because I'm using 12.1 instead of 12.2. The readme doc currently refers to 12.1, but the install expects 12.2. We could be more generic about it to help other users troubleshoot, maybe: "Run torch freeze, make sure you have the version listed installed in (/.../), otherwise pip uninstall torch (...) pip install torch (...), or install the correct version of Torch from (...)."
true something like a "currently supported versions" list would be helpful
Hi Not sure what is happening here, but when I try to run
python server.py
, it saysNo CUDA runtime is found
.Is a specific version needed? Which one?
ie, I have 12.0 installed with PATH set in Win11 environment variables. I use 12.0 for 3D rendering and design.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0
It says not found here though, should I assume 12.0 is not the version we need then?So, I also installed 12.3 inside the conda environment "exui". It's not seeing this either. I though the point of an environment was that it looks in there first, and then outside if the components it needs are not present in the env. If I have to set the windows PATH for every environment I run, I'll be forever switching the PATH all day.
Or is 12.3 too new? Do I need 12.2 instead?
This GUI/app looks really good, but I think the install instructions could be a bit more detailed and take into account an environment manager or two.