turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.23k stars 238 forks source link

Freeze after import (maybe ROCm only) #256

Closed lufixSch closed 4 months ago

lufixSch commented 6 months ago

When installing exllamav2 with pip install exllamav2, after importing exlamav2 python freezes and I have to exit with Ctrl + C.

I also wasn't able to get it running with the prebuild wheels for ROCm before (undefined symbol <some symbol in exllamav2_ext.cpython-310-x86_64-linux-gnu.so> - I unfortunately did not copy the exact error). But I wasn't able to reproduce this in another venv (did not want to break my installation again in my main venv xD)

I got it running by cloning the repository and building from source.

Exllama worked for me before. It broke right around when I added a second GPU to my system. But because problems with Pytorch kept me from using Exllama for some weeks I'm not sure if that is related or it's just that something changed with v0.0.11

p.s.: Is there a reason for using python setup.py install --user in the readme. This did not work for me. I use venv for virtual environments and pyenv for handling multiple python versions.

ashleykleynhans commented 6 months ago

Freezing for me on CUDA 12.1 as well.

turboderp commented 6 months ago

With the JIT version installed, the first time you import the library it compiles the C++/CUDA extension. This also happens if there's a change to the configuration such as adding a new CUDA device.

This can be a little slow and Torch doesn't really give any feedback while the build is going on, unless you ask it to be verbose (set verbose = True at the top of ext.py if you want) but then it dumps a huge amount of information to the console even when there's no work to do. I'll see if I can find some hacky way to get feedback just when it's building the extension, maybe. In the meantime, the latest dev version should reduce the compile time quite a bit by splitting the template instances into multiple compilation units.

Here are some things to try:

lufixSch commented 6 months ago

I understand that the JIT version will install on first Import but I waited really long and still nothing happened (I waited way longer than it takes to build from source). As I said, when building from source it works fine. (I'm still unsure why I suggest the --user flag)

After reinstalling with pip install exllamav2 I'm now able to reproduce the alternative Error I referred to before. I get the following output on my first import of exllamav2

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/exllamav2/model.py", line 17, in <module>
    from exllamav2.cache import ExLlamaV2CacheBase
  File "/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/exllamav2/cache.py", line 2, in <module>
    from exllamav2.ext import exllamav2_ext as ext_c
  File "/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/exllamav2/ext.py", line 15, in <module>
    import exllamav2_ext
ImportError: /data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/exllamav2_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb
turboderp commented 6 months ago

The --user flag installs the package for the current user rather than system-wide, which (at least in my experience) avoids some potential complications. I've seen system-wide installs require root access to install, and while maybe that's just because of some misconfiguration in the environment, it's always been my impression that the --user flag is the safer bet in any case.

As for that ImportError, I've seen it before, referring specifically to "UserWarning" and "DeprecationWarning" symbols, but I can't remember what the solution was. I think it had to do with an extension being compiled for one version of Torch and then loaded by another. Is it possible you have an old version of the extension cached in ~/.cache/torch_extensions/? Are there maybe multiple versions of Torch on your system in separate envs?

lufixSch commented 6 months ago

The --user flag installs the package for the current user rather than system-wide

So this is only needed when not using a venv? When I used the flag It installed it to a strange location, where python in my venv could not find it.

Is it possible you have an old version of the extension cached in ~/.cache/torch_extensions/? Are there maybe multiple versions of Torch on your system in separate envs?

This could be possible. Since some time I use the nightly build of torch (2.3) as it fixed an error I had with my GPU. Before I used torch (2.2) and some venvs might still have this version installed.

AndriiAndrus commented 5 months ago

Same here, it was working until I trained my own model with axolotl and converted it to exl2. Now it freezes on importing (from exllamav2 import) no matter what. I deleted whole env and reinstalled exllamav2, but no matter what versions of exllamav2 or torch I use it freezes on import. FYI I'm on Ubuntu w/ 3090.

hemangjoshi37a commented 5 months ago

how to check if it is using ROCm or not for generating text . please help me

lufixSch commented 4 months ago

Not sure what change but for me the everything is working fine again with version 0.0.12

AndriiAndrus commented 4 months ago

Alright this is weird, after I fixed freezes by reinstalling everything I could multiple times it finally worked for a while. Just now, I made some changes to my code (generation/prompt part) it now freezes AGAIN! I did not touch pip at all!! This is also happening only to one of my projects, I recreated env and all.. must be a cursed folder.