turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

Import error: exllama_ext.so not found #148

Closed nivibilla closed 1 year ago

nivibilla commented 1 year ago

Hey,

Im trying to run the basic example. I get this error

ImportError: /root/.cache/torch_extensions/py310_cu117/exllama_ext/exllama_ext.so: cannot open shared object file: No such file or directory

Here Is the trace.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File <command-496841562764244>:1
----> 1 from model import ExLlama, ExLlamaCache, ExLlamaConfig

File /databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py:172, in _create_import_patch.<locals>.import_patch(name, globals, locals, fromlist, level)
    167 thread_local._nest_level += 1
    169 try:
    170     # Import the desired module. If you’re seeing this while debugging a failed import,
    171     # look at preceding stack frames for relevant error information.
--> 172     original_result = python_builtin_import(name, globals, locals, fromlist, level)
    174     is_root_import = thread_local._nest_level == 1
    175     # `level` represents the number of leading dots in a relative import statement.
    176     # If it's zero, then this is an absolute import.

File /databricks/driver/exllama/model.py:12
     10 import torch.nn.functional as F
     11 from safetensors import safe_open
---> 12 import cuda_ext
     13 import json
     14 import math

File /databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py:172, in _create_import_patch.<locals>.import_patch(name, globals, locals, fromlist, level)
    167 thread_local._nest_level += 1
    169 try:
    170     # Import the desired module. If you’re seeing this while debugging a failed import,
    171     # look at preceding stack frames for relevant error information.
--> 172     original_result = python_builtin_import(name, globals, locals, fromlist, level)
    174     is_root_import = thread_local._nest_level == 1
    175     # `level` represents the number of leading dots in a relative import statement.
    176     # If it's zero, then this is an absolute import.

File /databricks/driver/exllama/cuda_ext.py:44
     41         else:
     42             print("Unable to find cl.exe; compilation will probably fail.", file=sys.stderr)
---> 44 exllama_ext = load(
     45     name = extension_name,
     46     sources = [
     47         os.path.join(library_dir, "exllama_ext/exllama_ext.cpp"),
     48         os.path.join(library_dir, "exllama_ext/cuda_buffers.cu"),
     49         os.path.join(library_dir, "exllama_ext/cuda_func/q4_matrix.cu"),
     50         os.path.join(library_dir, "exllama_ext/cuda_func/q4_matmul.cu"),
     51         os.path.join(library_dir, "exllama_ext/cuda_func/column_remap.cu"),
     52         os.path.join(library_dir, "exllama_ext/cuda_func/rms_norm.cu"),
     53         os.path.join(library_dir, "exllama_ext/cuda_func/rope.cu"),
     54         os.path.join(library_dir, "exllama_ext/cuda_func/half_matmul.cu"),
     55         os.path.join(library_dir, "exllama_ext/cuda_func/q4_attn.cu"),
     56         os.path.join(library_dir, "exllama_ext/cuda_func/q4_mlp.cu"),
     57         os.path.join(library_dir, "exllama_ext/cpu_func/rep_penalty.cpp")
     58     ],
     59     extra_include_paths = [os.path.join(library_dir, "exllama_ext")],
     60     verbose = verbose,
     61     extra_ldflags = (["cublas.lib"] + ([f"/LIBPATH:{os.path.join(sys.base_prefix, 'libs')}"] if sys.base_prefix != sys.prefix else [])) if windows else [],
     62     extra_cuda_cflags = ["-lineinfo"] + (["-U__HIP_NO_HALF_CONVERSIONS__", "-O3"] if torch.version.hip else []),
     63     extra_cflags = ["-O3"]
     64     # extra_cflags = ["-ftime-report", "-DTORCH_USE_CUDA_DSA"]
     65 )
     67 # from exllama_ext import set_tuning_params
     68 # from exllama_ext import prepare_buffers
     69 from exllama_ext import make_q4

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e8fe6c01-b22c-4694-8bb0-41ebd93385e1/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1284, in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1192 def load(name,
   1193          sources: Union[str, List[str]],
   1194          extra_cflags=None,
   (...)
   1202          is_standalone=False,
   1203          keep_intermediates=True):
   1204     r'''
   1205     Loads a PyTorch C++ extension just-in-time (JIT).
   1206 
   (...)
   1282         ...     verbose=True)
   1283     '''
-> 1284     return _jit_compile(
   1285         name,
   1286         [sources] if isinstance(sources, str) else sources,
   1287         extra_cflags,
   1288         extra_cuda_cflags,
   1289         extra_ldflags,
   1290         extra_include_paths,
   1291         build_directory or _get_build_directory(name, verbose),
   1292         verbose,
   1293         with_cuda,
   1294         is_python_module,
   1295         is_standalone,
   1296         keep_intermediates=keep_intermediates)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e8fe6c01-b22c-4694-8bb0-41ebd93385e1/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1535, in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1532 if is_standalone:
   1533     return _get_exec_path(name, build_directory)
-> 1535 return _import_module_from_library(name, build_directory, is_python_module)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e8fe6c01-b22c-4694-8bb0-41ebd93385e1/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1929, in _import_module_from_library(module_name, path, is_python_module)
   1927 spec = importlib.util.spec_from_file_location(module_name, filepath)
   1928 assert spec is not None
-> 1929 module = importlib.util.module_from_spec(spec)
   1930 assert isinstance(spec.loader, importlib.abc.Loader)
   1931 spec.loader.exec_module(module)

ImportError: /root/.cache/torch_extensions/py310_cu117/exllama_ext/exllama_ext.so: cannot open shared object file: No such file or directory
EyeDeck commented 1 year ago

Not sure but it looks like this might be a Linux version of #100, Make sure your PyTorch and CUDA Toolkit (11.7 | 11.8 | 12.1) versions match (it doesn't matter which one specifically, just that they're both the same), otherwise PyTorch can successfully compile the exllama C++ extension, but it won't load because the extension will have been compiled for a different CUDA version than the one that the currently-installed version of Torch works with.

nivibilla commented 1 year ago

@EyeDeck thanks, It was a mix of things.

The solution was to first make my own fork and merge the pip install capability from #125. Then add some code to my databricks cluster to update the necessary libraries

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb && \
  dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
  dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
  dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
  dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb

After doing all this I was able to run the batch example by importing like this

from exllama_lib.model import ExLlama, ExLlamaCache, ExLlamaConfig
from exllama_lib.tokenizer import ExLlamaTokenizer
from exllama_lib.generator import ExLlamaGenerator

For reference my fork is at https://github.com/nivibilla/exllama

I would make a PR but @paolorechia already has #125 for the pip install. @turboderp if you want me to I can add these instructions into readme. But as the pip install PR is not merged yet I guess it would make sense to wait.