turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa #427

Closed rjmehta1993 closed 1 week ago

rjmehta1993 commented 2 months ago

ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa

Getting this error torch is using same cuda as pytorch.

nvcc -V and pytorch cuda are matching

electricadev commented 2 months ago

I'm also seeing a similar error using the 0.0.19 release.

I am using the following whl: https://github.com/turboderp/exllamav2/releases/download/v0.0.19/exllamav2-0.0.19+cu118-cp310-cp310-linux_x86_64.whl

This was working fine last week and today when I rebuilt my container I am having this issue.

ImportError: /usr/local/lib/python3.10/dist-packages/exllamav2_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

turboderp commented 2 months ago

The reason for this particular kind of error is a version mismatch between your PyTorch binaries and the ExLlamaV2 binaries. For some reason PyTorch breaks the extension API with every new release, and I haven't found a good way to keep track of it all.

But basically, if you're building from source or using the JIT mode, everything should work regardless of version. For the prebuilt wheels you want torch==2.3.0 for exllamav2==0.0.20 or torch 2.2.0 for exllamav2==0.0.19.

You may want to use the --force-reinstall option when installing PyTorch, since it looks like some dependencies might not get fully resolved otherwise, and make sure you also keep the torchvision and torchaudio packages in sync, even though they aren't used here.

electricadev commented 2 months ago

Thanks that makes sense. Appreciate the guidance