turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

Quantization is no longer possible in version 0.0.12 #303

Closed Nyandaro closed 5 months ago

Nyandaro commented 5 months ago

Quantization is no longer possible in 0.0.12 version, but quantization is possible in 0.0.11 version.

PS C:\exllamav2-0.0.12> python convert.py -i mnt/models/TinyLlama-1.1B-python-v0.1/ -o mnt/models/temp/ -cf mnt/models/TinyLlama-1.1B-python-v0.1-4.0bpw-exl2/ -b 4.0 -- Resuming job !! Note: Overriding options with settings from existing job -- Input: mnt/models/TinyLlama-1.1B-python-v0.1/ -- Output: mnt/models/temp/ -- Using default calibration dataset -- Target bits per weight: 4.0 (decoder), 6 (head) -- Max shard size: 8192 MB -- RoPE scale: 1.00 -- RoPE alpha: 1.00 -- Full model will be compiled to: mnt/models/TinyLlama-1.1B-python-v0.1-4.0bpw-exl2/ Traceback (most recent call last): File "C:\exllamav2-0.0.12\convert.py", line 185, in model.load(lazy = True) File "C:\exllamav2-0.0.12\exllamav2\model.py", line 248, in load for item in f: return item File "C:\exllamav2-0.0.12\exllamav2\model.py", line 276, in load_gen cleanup_stfiles() File "C:\exllamav2-0.0.12\exllamav2\fasttensors.py", line 30, in cleanup_stfiles ext_c.safetensors_free_pinned_buffer() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'exllamav2_ext' has no attribute 'safetensors_free_pinned_buffer'

The way I quantized is the same for both versions.

OS:WIN10 CPU:Ryzen 7 5700G GPU:RTX4070 12GB RAM :32GB

turboderp commented 5 months ago

Do you still have v0.0.11 installed?

pip show exllamav2

Nyandaro commented 5 months ago

Thank you

pip uninstall exllamav2 pip cache purge pip install exllamav2

[ pip uninstall torch pip cache purge pip install torch --index-url https://download.pytorch.org/whl/cu121 ]

or

[ pip uninstall torch torchvision torchaudio pip cache purge pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ]

note: pip install exllamav2
or pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-win_amd64.whl

The 0.0.12 version is now installed correctly. and it started working correctly.