Can't load GPTQ model with ExLlamav2

Simplegram commented 12 months ago

Describe the bug

Can't load GPTQ model with ExLlamav2_HF and ExLlamav2. I have tried these two models:

TheBloke_upstage-llama-30b-instruct-2048-GPTQ_gptq-4bit-128g-actorder_True
TheBloke_OpenOrca-Platypus2-13B-GPTQ_gptq-4bit-32g-actorder_True

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Just load any model with ExLlamav2

Conda list: 7zip 19.00 h2d74725_2 conda-forge absl-py 1.4.0 pypi_0 pypi accelerate 0.22.0 pypi_0 pypi aiofiles 23.1.0 pypi_0 pypi aiohttp 3.8.4 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi altair 4.2.2 pypi_0 pypi annotated-types 0.5.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi anyio 3.6.2 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi asciitree 0.3.3 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 23.1.0 pypi_0 pypi auto-gptq 0.4.2+cu118 pypi_0 pypi backcall 0.2.0 pypi_0 pypi backoff 2.2.1 pypi_0 pypi beautifulsoup4 4.12.2 pypi_0 pypi bingimagecreator 0.4.4 pypi_0 pypi bitsandbytes 0.41.1 pypi_0 pypi blas 1.0 mkl defaults blinker 1.6.2 pypi_0 pypi boto3 1.26.145 pypi_0 pypi botocore 1.29.145 pypi_0 pypi brotlipy 0.7.0 py310h2bbff1b_1002 defaults bzip2 1.0.8 he774522_0 defaults ca-certificates 2023.5.7 h56e8100_0 conda-forge cachetools 5.3.1 pypi_0 pypi certifi 2022.12.7 pypi_0 pypi cffi 1.15.1 py310h2bbff1b_3 defaults chardet 5.1.0 pypi_0 pypi charset-normalizer 2.1.1 pypi_0 pypi chromadb 0.3.18 pypi_0 pypi click 8.1.3 pypi_0 pypi clickhouse-connect 0.5.23 pypi_0 pypi colorama 0.4.6 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi contourpy 1.0.7 pypi_0 pypi cramjam 2.7.0 pypi_0 pypi cryptography 39.0.1 py310h21b164f_0 defaults ctransformers 0.2.27+cu117 pypi_0 pypi cuda-cccl 11.7.58 0 nvidia/label/cuda-11.7.0 cuda-command-line-tools 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-compiler 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-cudart 11.7.60 0 nvidia/label/cuda-11.7.0 cuda-cudart-dev 11.7.60 0 nvidia/label/cuda-11.7.0 cuda-cuobjdump 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-cupti 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-cuxxfilt 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-documentation 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-libraries 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-libraries-dev 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-memcheck 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nsight-compute 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-nvcc 11.7.64 0 nvidia/label/cuda-11.7.0 cuda-nvdisasm 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvml-dev 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvprof 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvprune 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvrtc 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvrtc-dev 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvtx 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-nvvp 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-runtime 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-sanitizer-api 11.7.50 0 nvidia/label/cuda-11.7.0 cuda-toolkit 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-tools 11.7.0 0 nvidia/label/cuda-11.7.0 cuda-visual-tools 11.7.0 0 nvidia/label/cuda-11.7.0 cudatoolkit-dev 11.7.0 h9f2f4db_6 conda-forge curl 8.1.1 h2bbff1b_1 defaults cycler 0.11.0 pypi_0 pypi dataproperty 0.55.1 pypi_0 pypi datasets 2.14.5 pypi_0 pypi decorator 5.1.1 pypi_0 pypi deep-translator 1.9.2 pypi_0 pypi dill 0.3.6 pypi_0 pypi discord-py 2.3.1 pypi_0 pypi diskcache 5.6.1 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi docopt 0.6.2 pypi_0 pypi duckdb 0.7.1 pypi_0 pypi edgegpt 0.13.2 pypi_0 pypi einops 0.6.1 pypi_0 pypi elevenlabs 0.2.26 pypi_0 pypi elevenlabslib 0.6.0 pypi_0 pypi encodec 0.1.1 pypi_0 pypi entrypoints 0.4 pypi_0 pypi exceptiongroup 1.1.1 pypi_0 pypi exllama 0.0.17+cu117 pypi_0 pypi exllamav2 0.0.1 pypi_0 pypi fastapi 0.95.2 pypi_0 pypi fasteners 0.18 pypi_0 pypi fastparquet 2023.8.0 pypi_0 pypi ffmpeg 1.4 pypi_0 pypi ffmpeg-python 0.2.0 pypi_0 pypi ffmpy 0.3.0 pypi_0 pypi filelock 3.9.0 py310haa95532_0 defaults flask 2.3.2 pypi_0 pypi flask-cloudflared 0.0.14 pypi_0 pypi flexgen 0.1.7 pypi_0 pypi fonttools 4.39.3 pypi_0 pypi freetype 2.12.1 ha860e81_0 defaults frozenlist 1.3.3 pypi_0 pypi fsspec 2023.4.0 pypi_0 pypi funcy 2.0 pypi_0 pypi future 0.18.3 pypi_0 pypi giflib 5.2.1 h8cc25b3_3 defaults git 2.34.1 haa95532_0 defaults gitdb 4.0.10 pypi_0 pypi gitpython 3.1.31 pypi_0 pypi google-auth 2.22.0 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi gptq-for-llama 0.1.0+cu117 pypi_0 pypi gptq-llama 0.2.2 pypi_0 pypi gradio 3.33.1 pypi_0 pypi gradio-client 0.2.5 pypi_0 pypi grpcio 1.56.2 pypi_0 pypi h11 0.14.0 pypi_0 pypi hnswlib 0.7.0 pypi_0 pypi httpcore 0.17.0 pypi_0 pypi httptools 0.5.0 pypi_0 pypi httpx 0.24.0 pypi_0 pypi huggingface-hub 0.16.4 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi idna 3.4 py310haa95532_0 defaults iniconfig 2.0.0 pypi_0 pypi intel-openmp 2023.1.0 h59b6b97_46319 defaults ipython 8.14.0 pypi_0 pypi itsdangerous 2.1.2 pypi_0 pypi jedi 0.18.2 pypi_0 pypi jinja2 3.1.2 py310haa95532_0 defaults jmespath 1.0.1 pypi_0 pypi joblib 1.2.0 pypi_0 pypi jpeg 9e h2bbff1b_1 defaults jsonlines 3.1.0 pypi_0 pypi jsonschema 4.17.3 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi lerc 3.0 hd77b12b_0 defaults libcublas 11.10.1.25 0 nvidia/label/cuda-11.7.0 libcublas-dev 11.10.1.25 0 nvidia/label/cuda-11.7.0 libcufft 10.7.2.50 0 nvidia/label/cuda-11.7.0 libcufft-dev 10.7.2.50 0 nvidia/label/cuda-11.7.0 libcurand 10.2.10.50 0 nvidia/label/cuda-11.7.0 libcurand-dev 10.2.10.50 0 nvidia/label/cuda-11.7.0 libcurl 8.1.1 h86230a5_1 defaults libcusolver 11.3.5.50 0 nvidia/label/cuda-11.7.0 libcusolver-dev 11.3.5.50 0 nvidia/label/cuda-11.7.0 libcusparse 11.7.3.50 0 nvidia/label/cuda-11.7.0 libcusparse-dev 11.7.3.50 0 nvidia/label/cuda-11.7.0 libdeflate 1.17 h2bbff1b_0 defaults libffi 3.4.2 hd77b12b_6 defaults libnpp 11.7.3.21 0 nvidia/label/cuda-11.7.0 libnpp-dev 11.7.3.21 0 nvidia/label/cuda-11.7.0 libnvjpeg 11.7.2.34 0 nvidia/label/cuda-11.7.0 libnvjpeg-dev 11.7.2.34 0 nvidia/label/cuda-11.7.0 libpng 1.6.39 h8cc25b3_0 defaults libssh2 1.10.0 hcd4344a_2 defaults libtiff 4.5.0 h6c2663c_2 defaults libuv 1.44.2 h2bbff1b_0 defaults libwebp 1.2.4 hbc33d0d_1 defaults libwebp-base 1.2.4 h2bbff1b_1 defaults linkify-it-py 2.0.2 pypi_0 pypi llama-cpp-python 0.1.85 pypi_0 pypi llama-cpp-python-cuda 0.1.85+cu117 pypi_0 pypi llvmlite 0.40.0 pypi_0 pypi lxml 4.9.3 pypi_0 pypi lz4 4.3.2 pypi_0 pypi lz4-c 1.9.4 h2bbff1b_0 defaults markdown 3.4.4 pypi_0 pypi markdown-it-py 2.2.0 pypi_0 pypi markupsafe 2.1.2 pypi_0 pypi matplotlib 3.7.1 pypi_0 pypi matplotlib-inline 0.1.6 pypi_0 pypi mbstrdecoder 1.1.2 pypi_0 pypi mdit-py-plugins 0.3.3 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h8bd8f75_46356 defaults mkl-service 2.4.0 py310h2bbff1b_1 defaults mkl_fft 1.3.6 py310h4ed8f06_1 defaults mkl_random 1.2.2 py310h4ed8f06_1 defaults monotonic 1.6 pypi_0 pypi more-itertools 9.1.0 pypi_0 pypi mpmath 1.2.1 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi mypy-extensions 1.0.0 pypi_0 pypi networkx 3.0 pypi_0 pypi ngrok 0.8.1 pypi_0 pypi ninja 1.11.1 pypi_0 pypi ninja-base 1.10.2 h6d14046_5 defaults nltk 3.8.1 pypi_0 pypi nsight-compute 2022.2.0.13 0 nvidia/label/cuda-11.7.0 num2words 0.5.12 pypi_0 pypi numba 0.57.0 pypi_0 pypi numcodecs 0.11.0 pypi_0 pypi numexpr 2.8.4 pypi_0 pypi numpy 1.24.1 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi omegaconf 2.3.0 pypi_0 pypi oobabot 0.2.0 pypi_0 pypi oobabot-plugin 0.2.0 pypi_0 pypi openai 0.27.7 pypi_0 pypi openai-whisper 20230314 pypi_0 pypi openssl 1.1.1u h2bbff1b_0 defaults optimum 1.13.1 pypi_0 pypi orjson 3.8.11 pypi_0 pypi packaging 23.1 pypi_0 pypi pandas 2.1.0 pypi_0 pypi pathtools 0.1.2 pypi_0 pypi pathvalidate 2.5.2 pypi_0 pypi peft 0.5.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 9.3.0 pypi_0 pypi pip 23.2.1 pypi_0 pypi pluggy 1.0.0 pypi_0 pypi portalocker 2.7.0 pypi_0 pypi posthog 2.4.2 pypi_0 pypi prompt-toolkit 3.0.38 pypi_0 pypi protobuf 4.23.1 pypi_0 pypi psutil 5.9.5 pypi_0 pypi pulp 2.7.0 pypi_0 pypi py-cpuinfo 9.0.0 pypi_0 pypi pyarrow 12.0.0 pypi_0 pypi pyasn1 0.5.0 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pybind11 2.10.4 pypi_0 pypi pycountry 22.3.5 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 defaults pydantic 1.10.12 pypi_0 pypi pydantic-core 2.6.3 pypi_0 pypi pydub 0.25.1 pypi_0 pypi pygments 2.15.1 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyopenssl 23.0.0 py310haa95532_0 defaults pyparsing 3.0.9 pypi_0 pypi pyre-extensions 0.0.29 pypi_0 pypi pyreadline3 3.4.1 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi pysbd 0.3.4 pypi_0 pypi pysocks 1.7.1 py310haa95532_0 defaults pytablewriter 0.64.2 pypi_0 pypi pytest 7.2.2 pypi_0 pypi python 3.10.11 h966fe2a_2 defaults python-dotenv 1.0.0 pypi_0 pypi python-multipart 0.0.6 pypi_0 pypi pytorch-cuda 11.7 h16d0643_3 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2023.3 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi quant-cuda 0.0.0 pypi_0 pypi regex 2023.5.5 pypi_0 pypi requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi responses 0.18.0 pypi_0 pypi rich 13.3.5 pypi_0 pypi rouge 1.0.1 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi rsa 4.9 pypi_0 pypi ruamel-yaml 0.17.32 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi rwkv 0.7.3 pypi_0 pypi s3transfer 0.6.1 pypi_0 pypi sacrebleu 1.5.0 pypi_0 pypi safetensors 0.3.2 pypi_0 pypi scikit-learn 1.2.2 pypi_0 pypi scipy 1.11.2 pypi_0 pypi semantic-version 2.10.0 pypi_0 pypi sentence-transformers 2.2.2 pypi_0 pypi sentencepiece 0.1.99 pypi_0 pypi sentry-sdk 1.23.1 pypi_0 pypi setproctitle 1.3.2 pypi_0 pypi setuptools 67.8.0 pypi_0 pypi smmap 5.0.0 pypi_0 pypi sniffio 1.3.0 pypi_0 pypi socksio 1.0.0 pypi_0 pypi sounddevice 0.4.6 pypi_0 pypi soundfile 0.12.1 pypi_0 pypi soupsieve 2.4.1 pypi_0 pypi speechrecognition 3.10.0 pypi_0 pypi sqlite 3.41.2 h2bbff1b_0 defaults sqlitedict 2.1.0 pypi_0 pypi starlette 0.27.0 pypi_0 pypi suno-bark 0.0.1a0 pypi_0 pypi sympy 1.11.1 py310haa95532_0 defaults tabledata 1.3.1 pypi_0 pypi tbb 2021.8.0 h59b6b97_0 defaults tcolorpy 0.1.3 pypi_0 pypi tensorboard 2.14.0 pypi_0 pypi tensorboard-data-server 0.7.1 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tiktoken 0.3.1 pypi_0 pypi timm 0.6.13 pypi_0 pypi tk 8.6.12 h2bbff1b_0 defaults tokenizers 0.13.3 pypi_0 pypi tomli 2.0.1 pypi_0 pypi toolz 0.12.0 pypi_0 pypi torch 2.0.1+cu118 pypi_0 pypi torchaudio 2.0.2+cu118 pypi_0 pypi torchvision 0.15.2+cu118 pypi_0 pypi tqdm 4.66.1 pypi_0 pypi tqdm-multiprocess 0.0.11 pypi_0 pypi transformers 4.33.1 pypi_0 pypi typepy 1.3.0 pypi_0 pypi typing 3.7.4.3 pypi_0 pypi typing-extensions 4.4.0 pypi_0 pypi typing-inspect 0.8.0 pypi_0 pypi tzdata 2023.3 pypi_0 pypi uc-micro-py 1.0.2 pypi_0 pypi urllib3 1.26.13 pypi_0 pypi uvicorn 0.22.0 pypi_0 pypi vc 14.2 h21ff451_1 defaults vs2015_runtime 14.27.29016 h5e58377_2 defaults wandb 0.15.10 pypi_0 pypi watchfiles 0.19.0 pypi_0 pypi websockets 11.0.2 pypi_0 pypi werkzeug 2.3.3 pypi_0 pypi wheel 0.40.0 pypi_0 pypi win_inet_pton 1.1.0 py310haa95532_0 defaults xformers 0.0.19 pypi_0 pypi xxhash 3.2.0 pypi_0 pypi xz 5.4.2 h8cc25b3_0 defaults yarl 1.9.2 pypi_0 pypi zarr 2.14.2 pypi_0 pypi zlib 1.2.13 h8cc25b3_0 defaults zstandard 0.21.0 pypi_0 pypi zstd 1.5.5 hd43e919_0 defaults

Screenshot

No response

Logs

Traceback (most recent call last):
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\text-generation-webui\modules\ui_model_menu.py", line 194, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\text-generation-webui\modules\models.py", line 77, in load_model
    output = load_func_map[loader](model_name)
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\text-generation-webui\modules\models.py", line 342, in ExLlamav2_HF_loader
    from modules.exllamav2_hf import Exllamav2HF
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\text-generation-webui\modules\exllamav2_hf.py", line 6, in <module>
    from exllamav2 import ExLlamaV2, ExLlamaV2Cache, ExLlamaV2Config
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\model.py", line 12, in <module>
    from exllamav2.linear import ExLlamaV2Linear
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\linear.py", line 4, in <module>
    from exllamav2 import ext
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\ext.py", line 118, in <module>
    exllamav2_ext = load \
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\utils\cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\utils\cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\utils\cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'exllamav2_ext': [1/12] cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cpp\sampling.cpp /Fosampling.o
FAILED: sampling.o
cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cpp\sampling.cpp /Fosampling.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cpp\sampling.h(4): fatal error C1083: Cannot open include file: 'cstdlib': No such file or directory
[2/12] cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\ext.cpp /Foext.o
FAILED: ext.o
cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\ext.cpp /Foext.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\c10/macros/Macros.h(3): fatal error C1083: Cannot open include file: 'cassert': No such file or directory
[3/12] cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cpp\quantize_func.cpp /Foquantize_func.o
FAILED: quantize_func.o
cl /showIncludes -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Ox -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cpp\quantize_func.cpp /Foquantize_func.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\c10/macros/Macros.h(3): fatal error C1083: Cannot open include file: 'cassert': No such file or directory
[4/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output rope.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\rope.cu -o rope.cuda.o
rope.cu
[5/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output pack_tensor.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\pack_tensor.cu -o pack_tensor.cuda.o
pack_tensor.cu
[6/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output q_matrix.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\q_matrix.cu -o q_matrix.cuda.o
q_matrix.cu
[7/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output rms_norm.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\rms_norm.cu -o rms_norm.cuda.o
rms_norm.cu
[8/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output quantize.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\quantize.cu -o quantize.cuda.o
quantize.cu
[9/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output q_attn.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\q_attn.cu -o q_attn.cuda.o
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here

E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2327): here
            instantiation of "__nv_bool c10::TensorImpl::SetDimsTemplate(c10::ArrayRef<T>) [with T=int64_t, <unnamed>=void]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2337): here

q_attn.cu
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(1602): note: see reference to class template instantiation 'c10::optional<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/impl/InlineDeviceGuard.h(427): note: see reference to class template instantiation 'c10::optional<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/DeviceGuard.h(178): note: see reference to class template instantiation 'c10::impl::InlineOptionalDeviceGuard<c10::impl::VirtualGuardImpl>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
[10/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output q_mlp.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\q_mlp.cu -o q_mlp.cuda.o
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here

E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2327): here
            instantiation of "__nv_bool c10::TensorImpl::SetDimsTemplate(c10::ArrayRef<T>) [with T=int64_t, <unnamed>=void]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2337): here

q_mlp.cu
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(1602): note: see reference to class template instantiation 'c10::optional<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/impl/InlineDeviceGuard.h(427): note: see reference to class template instantiation 'c10::optional<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/DeviceGuard.h(178): note: see reference to class template instantiation 'c10::impl::InlineOptionalDeviceGuard<c10::impl::VirtualGuardImpl>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
[11/12] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output q_gemm.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\TH -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IE:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -lineinfo -O3 -c E:\Projects\cpy\llms\WebUI\textgen-webui-gpu\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\q_gemm.cu -o q_gemm.cuda.o
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here

E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2327): here
            instantiation of "__nv_bool c10::TensorImpl::SetDimsTemplate(c10::ArrayRef<T>) [with T=int64_t, <unnamed>=void]"
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(2337): here

q_gemm.cu
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/TensorImpl.h(1602): note: see reference to class template instantiation 'c10::optional<c10::SymInt>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::SymInt
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(411): note: see reference to class template instantiation 'c10::constexpr_storage_t<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to class template instantiation 'c10::trivially_copyable_optimization_optional_base<T>' being compiled
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(549): note: see reference to alias template instantiation 'c10::OptionalBase<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/impl/InlineDeviceGuard.h(427): note: see reference to class template instantiation 'c10::optional<c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/core/DeviceGuard.h(178): note: see reference to class template instantiation 'c10::impl::InlineOptionalDeviceGuard<c10::impl::VirtualGuardImpl>' being compiled
E:/Projects/cpy/llms/WebUI/textgen-webui-gpu/installer_files/env/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base<T>': destructor was implicitly defined as deleted
        with
        [
            T=c10::impl::InlineDeviceGuard<c10::impl::VirtualGuardImpl>
        ]
ninja: build stopped: subcommand failed.

System Info

Local install
Windows 11
RTX 4090

YakuzaSuske commented 12 months ago

i also cannot load any models with Exllama2

YakuzaSuske commented 12 months ago

Did a little digging and saw thus for the "module not found" error. Not really sure how to fix it tho.

found someone that i think fixed it?

YakuzaSuske commented 12 months ago

i managed to fix my issue now. I had to install visual studio build tools The first option that says C++ and install it by default. Then i just restarted and it loaded the model fine.

noobmaster29 commented 12 months ago

Installing visual studio build tools didn't solve it for me. I'm running on Windows 11 and I'm also getting the DLL load fail error.

Timberfang commented 11 months ago

I'm having this issue in the Docker container version using Windows 11 with WSL 2.

esherriff commented 11 months ago

I was having this issue and am now facing a related problem. To get past the original errors I installed G++ in MSYS64 and added it to my path, then I started getting errors saying it couldn't find cl.exe, so I installed Visual Studio Community 2022 with C++ and Linux build tools enabled, then I added cl.exe to my path. Then I installed CUDA 11.7 toolkit for Windows from NVidia's toolkit website.

G++, cl and nvcc are all now in my path and can be called from a terminal. Nevertheless, when trying to load anything in Exllamav2 from a Windows Oogabooga 1.6 install, the ninja build still fails with errors like: **D:\AI\LLM\text-generation-webui\installer_files\env\bin\nvcc --generate-dependencies-with-compile --dependency-output q_attn.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -ID:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext -ID:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\torch\include -ID:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\torch\include\torch\csrc\api\include -ID:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\torch\include\TH -ID:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\torch\include\THC -ID:\AI\LLM\text-generation-webui\installer_files\env\include -ID:\AI\LLM\text-generation-webui\installer_files\env\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -lineinfo -O3 -c D:\AI\LLM\text-generation-webui\installer_files\env\lib\site-packages\exllamav2\exllamav2_ext\cuda\q_attn.cu -o q_attn.cuda.o CreateProcess failed: The system cannot find the file specified.** So I don't know what's wrong.

Is there some plan to resolve this dependency? It is not reasonable to expect ordinary users to be manually installing multiple toolchains and setting up their paths just to compile a single dependancy for Exllama2.

mushinbush commented 11 months ago

Is there some plan to resolve this dependency? It is not reasonable to expect ordinary users to be manually installing multiple toolchains and setting up their paths just to compile a single dependancy for Exllama2.

I encountered a similar error to what you described, but I managed to resolve it successfully. I'm sharing my method in hopes that it helps you as well.

You should follow the steps to install nvcc as mentioned in #3881:

conda install cuda -c nvidia/label/cuda-11.7.1

During my installation process, I also encountered errors. To resolve them, I moved my 'oobabooga' folder from 'D:\Installation\oobabooga' to the root directory of the system 'C:\oobabooga'. After doing this, I attempted the installation again, and it succeeded.

Finally, I loaded the model again using ExLlamav2, and I didn't encounter any issues.

esherriff commented 11 months ago

I encountered a similar error to what you described, but I managed to resolve it successfully. I'm sharing my method in hopes that it helps you as well.

You should follow the steps to install nvcc as mentioned in #3881:
conda install cuda -c nvidia/label/cuda-11.7.1
During my installation process, I also encountered errors. To resolve them, I moved my 'oobabooga' folder from 'D:\Installation\oobabooga' to the root directory of the system 'C:\oobabooga'. After doing this, I attempted the installation again, and it succeeded.

Finally, I loaded the model again using ExLlamav2, and I didn't encounter any issues.

I don't have enough disk space on C to put oogabooga on there, it needs to stay on D. So that's not an option. I tried installing CUDA into the oogabooga environment by opening a terminal in installer_files/env then calling the ..\conda\condabin\conda.bat install cuda -c nvidia/label/cuda-11.7.1

Based on my reading of the various oogabooga batch files I thought this would activate the oogabooga environment and install CUDA using its internal miniconda but it seemed to installed cuda it in the conda folder. Unsurprisingly I am still getting basically the same errors on the console, though I now have a cuda install scattered across the wrong folder and will have to clean it out.

I am actually surprised that exllama2 has been merged into main in this broken state without even an entry for it in requirements.txt to handle this. It really should have been left as a feature branch rather than expecting users to fix all of this themselves.

Here are some suggestions for the devs:

Backout exlllamav2 from main until you fix it in a feature branch.
If you expect users to set up environments themselves from your instructions, you better break it down Barney style not just post single command line calls for apps the average joe never uses (not everyone uses python so stop expecting them to know how).
If a given feature is experimental and is not expected to work on a one click install, state this in the documentation or the UI.

esherriff commented 11 months ago

Ok here is the correct sequence that got it working for me

Migrated to new one-click install method by moving text-generation-webui folder up a level (out of /oobabooga/) Or just use a fresh install of oogabooga release 1.6+
ran text-generation-webui\update_windows.bat
Install latest version of Visual Studio Community, and tick Desktop development with C++ (apparently vs_buildtools is also an option but you still need to ensure cl.exe is on the path).
Check cl.exe is in the path (edit system enivronment variables add C:\Program Files\Microsoft Visual Studio\2022\VC\Tools\MSVC\14.37.32822\bin\Hostx64\x64 to it if it's not there).
ran text-generation-webui\cmd_windows.bat
conda install cuda -c nvidia/label/cuda-11.7.1
pip install -U ninja exllamav2

The crucial bit I was missing was that I needed to install cuda to the oogabooga environment by running conda from text-generation-webui\cmd_windows.bat

Installing the cuda 11.7 toolkit for Windows (the obvious thing), still gave build errors despite nvcc being in path.

github-actions[bot] commented 10 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui