Open iamwavecut opened 1 month ago
You're using the 0.2.3 version of the repo with an older version of the exllamav2 library installed.
oh wow, something got messed up on my end, will rebuild from scratch and report back later then
The crude rebuild did not fix the problem due to the fact that the extension was loaded from the wrong more general path instead of the nested venv one. Identified by inspecting the exllamav2.ext.ext_c
module loading path. Removing the old file resolved the problem by allowing to load the freshly built extension.
However, maybe it's a sign to introduce a simple integrity check between library and loaded extension? F.e. just compare current version strings that are bundled in.
bump @turboderp
I had the same error with a python venv while running a script in the repo cloned on my PC.
This fixed it:
pip uninstall exllamav2 pip install -e .
Yes, that will install the current version. If you're on the dev branch, you sometimes have to because changes in the C++ extension have to be reflected in the Python code as well. But if you've clone the repo from the main branch, you can also just install the most recent prebuilt wheel and that should work, too.
Difficulties only arise if you have, say v0.2.1 installed in your venv and you're cloning the main branch. Then you end up running a mix of the code from two different versions and stuff breaks.
I've considered adding functions to try to detect that, since apparently it's a very common mistake people make, but it's hard to guarantee that the right version of those validation functions would be called.
OS
Linux
GPU Library
CUDA 12.x
Python version
3.10
Pytorch version
2.4.1+cu121
Model
Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24
Describe the bug
Reproduction steps
Expected behavior
No error
Logs
Additional context
No response
Acknowledgements