A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Import error: exllama_ext.so not found #148

Closed nivibilla closed 1 year ago

nivibilla commented 1 year ago


Im trying to run the basic example. I get this error

ImportError: /root/.cache/torch_extensions/py310_cu117/exllama_ext/exllama_ext.so: cannot open shared object file: No such file or directory

Here Is the trace.

ImportError: /root/.cache/torch_extensions/py310_cu117/exllama_ext/exllama_ext.so: cannot open shared object file: No such file or directory
EyeDeck commented 1 year ago

Not sure but it looks like this might be a Linux version of #100, Make sure your PyTorch and CUDA Toolkit (11.7 | 11.8 | 12.1) versions match (it doesn't matter which one specifically, just that they're both the same), otherwise PyTorch can successfully compile the exllama C++ extension, but it won't load because the extension will have been compiled for a different CUDA version than the one that the currently-installed version of Torch works with.

nivibilla commented 1 year ago

@EyeDeck thanks, It was a mix of things.

The solution was to first make my own fork and merge the pip install capability from #125. Then add some code to my databricks cluster to update the necessary libraries

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb && \
  dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
  dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
  dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
  dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb

After doing all this I was able to run the batch example by importing like this

from exllama_lib.model import ExLlama, ExLlamaCache, ExLlamaConfig
from exllama_lib.tokenizer import ExLlamaTokenizer
from exllama_lib.generator import ExLlamaGenerator

For reference my fork is at https://github.com/nivibilla/exllama

I would make a PR but @paolorechia already has #125 for the pip install. @turboderp if you want me to I can add these instructions into readme. But as the pip install PR is not merged yet I guess it would make sense to wait.