turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Installing exllama falied #448

Closed freQuensy23-coder closed 2 weeks ago

freQuensy23-coder commented 1 month ago

I'm having trouble working with the ads.

Code:

import sys, os
# sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from exllamav2 import (
    ExLlamaV2,
    ExLlamaV2Config,
    ExLlamaV2Cache,
    ExLlamaV2Tokenizer,
)

from exllamav2.generator import (
    Ex Llama V2 Streaming Generator,
    ExLlamaV2Sampler
)

throws an error:

RuntimeError: Error building extension 'exllamav2_ext': [1/43] c++ -MMD -MF profiling.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cpp/profiling.cpp -o profiling.o 
[2/43] c++ -MMD -MF sampling_avx2.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cpp/sampling_avx2.cpp -o sampling_avx2.o 
[3/43] c++ -MMD -MF sampling.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cpp/sampling.cpp -o sampling.o 
[4/43] /bin/nvcc --generate-dependencies-with-compile --dependency-output kernel_select.cuda.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cuda/comp_units/kernel_select.cu -o kernel_select.cuda.o 
FAILED: kernel_select.cuda.o 

<...>

/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
[42/43] /bin/nvcc --generate-dependencies-with-compile --dependency-output unit_exl2_3b.cuda.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cuda/comp_units/unit_exl2_3b.cu -o unit_exl2_3b.cuda.o 
FAILED: unit_exl2_3b.cuda.o 
/bin/nvcc --generate-dependencies-with-compile --dependency-output unit_exl2_3b.cuda.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/TH -isystem /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/torch/include/THC -isystem /include -isystem /home/alexeyv3/.conda/envs/exllama2/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/alexeyv3/.conda/envs/exllama2/lib/python3.11/site-packages/exllamav2/exllamav2_ext/cuda/comp_units/unit_exl2_3b.cu -o unit_exl2_3b.cuda.o 
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
ninja: build stopped: subcommand failed.

Full exception message here - https://gist.github.com/freQuensy23-coder/5faf1836f7aa007f6b58b32fb8c0c2d5

Steps to reproduce: 1) Create new conda env 2) clone exllama 2 repo 3) pip install -r requirements 4) pip install . (or EXLLAMA_NOCOMPILE= pip install . same error too)

Nvidia smi returns: `` Thu May 9 18:47:29 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100 80GB PCIe Off | 00000000:B1:00.0 Off | 0 | | N/A 51C P0 86W / 300W | 17667MiB / 81920MiB | 5% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+


Torch can work with this gpu correctly

Could someone please help investigate this build error with the exllamav2 package? Thank you.
turboderp commented 1 month ago

This is likely related to your CUDA installation. Make sure:

PenutChen commented 1 month ago

I recommend developing in a container. Here is my Dockerfile:

FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-devel
RUN apt-get update
RUN apt-get install -y git

WORKDIR /workspace/exl2
RUN git clone https://github.com/turboderp/exllamav2.git .
RUN pip install -r requirements.txt

ENV CUDA_HOME=/usr/local/cuda/
ENV TORCH_CUDA_ARCH_LIST="7.5"

RUN pip install .
turboderp commented 2 weeks ago

Closing some stale issues