turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.71k stars 283 forks source link

[BUG] Convert script fails to run on `master` branch as of v0.2.3 #655

Open iamwavecut opened 1 month ago

iamwavecut commented 1 month ago

OS

Linux

GPU Library

CUDA 12.x

Python version

3.10

Pytorch version

2.4.1+cu121

Model

Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24

Describe the bug

  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/stloader.py", line 160, in get_tensor
    ext_c.stloader_read(
AttributeError: module 'exllamav2_ext' has no attribute 'stloader_read'

Reproduction steps

git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -r requirements.txt
pip install .

huggingface-cli download --local-dir ./Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24 --local-dir-use-symlinks False Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24

python -m exllamav2.conversion.convert_exl2 -i "./Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24" -o "./Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24-EXL2" -cf "./Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24-EXL2/3.5bpw" -b 3.5 -hb 8 -rs 1  -l 512 -hsol 999

Expected behavior

No error

Logs

 -- Beginning new job
 -- Input: ../Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24
 -- Output: ../Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24-EXL2
 -- Calibration dataset: IlyaGusev/rulm, 100 / 16 rows, 512 tokens per sample
 -- Target bits per weight: 3.5 (decoder), 8 (head)
 -- Max shard size: 8192 MB
 -- RoPE scale: 1.00
 -- Full model will be compiled to: ../Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24-EXL2/3.5bpw
 -- Tokenizing samples (measurement)...
 -- First 50 tokens of dataset:
    '1 неделя, начало: построение режима\nВсем привет, очень много вопросов и советов было под первым постом, что меня очень удивило, количество подписчиков за пару дней перевалило за 50'
 -- Last 50 tokens of dataset:
    ' незаслуженно приговорили меня к смерти. Я пришёл в Цайран, чтобы рассказать правду о случившемся в Румере! Я признаю, что состоял в связи с драгной Озари'
 -- Token embeddings (measurement)...
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/conversion/convert_exl2.py", line 245, in <module>
    embeddings(job, save_job, model)
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/conversion/measure.py", line 81, in embeddings
    module.load()
  File "/mnt/disk2/.cache/python_user_base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/embedding.py", line 48, in load
    w = self.load_weight()
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/module.py", line 136, in load_weight
    tensors = self.load_multi(key, ["weight"], cpu = cpu)
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/module.py", line 89, in load_multi
    tensors[k] = stfile.get_tensor(key + "." + k, device = self.device() if not cpu else "cpu")
  File "/mnt/disk2/.cache/plotai/exllamav2/exllamav2/stloader.py", line 160, in get_tensor
    ext_c.stloader_read(
AttributeError: module 'exllamav2_ext' has no attribute 'stloader_read'

Additional context

No response

Acknowledgements

turboderp commented 1 month ago

You're using the 0.2.3 version of the repo with an older version of the exllamav2 library installed.

iamwavecut commented 1 month ago

oh wow, something got messed up on my end, will rebuild from scratch and report back later then

iamwavecut commented 1 month ago

The crude rebuild did not fix the problem due to the fact that the extension was loaded from the wrong more general path instead of the nested venv one. Identified by inspecting the exllamav2.ext.ext_c module loading path. Removing the old file resolved the problem by allowing to load the freshly built extension.

However, maybe it's a sign to introduce a simple integrity check between library and loaded extension? F.e. just compare current version strings that are bundled in.

bump @turboderp

UnstableLlama commented 1 month ago

I had the same error with a python venv while running a script in the repo cloned on my PC.

This fixed it:

pip uninstall exllamav2 pip install -e .

turboderp commented 1 month ago

Yes, that will install the current version. If you're on the dev branch, you sometimes have to because changes in the C++ extension have to be reflected in the Python code as well. But if you've clone the repo from the main branch, you can also just install the most recent prebuilt wheel and that should work, too.

Difficulties only arise if you have, say v0.2.1 installed in your venv and you're cloning the main branch. Then you end up running a mix of the code from two different versions and stuff breaks.

I've considered adding functions to try to detect that, since apparently it's a very common mistake people make, but it's hard to guarantee that the right version of those validation functions would be called.