Closed Nyandaro closed 5 months ago
Do you still have v0.0.11 installed?
pip show exllamav2
Thank you
pip uninstall exllamav2 pip cache purge pip install exllamav2
[ pip uninstall torch pip cache purge pip install torch --index-url https://download.pytorch.org/whl/cu121 ]
or
[ pip uninstall torch torchvision torchaudio pip cache purge pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ]
note: pip install exllamav2
or
pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-win_amd64.whl
The 0.0.12 version is now installed correctly. and it started working correctly.
Quantization is no longer possible in 0.0.12 version, but quantization is possible in 0.0.11 version.
PS C:\exllamav2-0.0.12> python convert.py -i mnt/models/TinyLlama-1.1B-python-v0.1/ -o mnt/models/temp/ -cf mnt/models/TinyLlama-1.1B-python-v0.1-4.0bpw-exl2/ -b 4.0 -- Resuming job !! Note: Overriding options with settings from existing job -- Input: mnt/models/TinyLlama-1.1B-python-v0.1/ -- Output: mnt/models/temp/ -- Using default calibration dataset -- Target bits per weight: 4.0 (decoder), 6 (head) -- Max shard size: 8192 MB -- RoPE scale: 1.00 -- RoPE alpha: 1.00 -- Full model will be compiled to: mnt/models/TinyLlama-1.1B-python-v0.1-4.0bpw-exl2/ Traceback (most recent call last): File "C:\exllamav2-0.0.12\convert.py", line 185, in
model.load(lazy = True)
File "C:\exllamav2-0.0.12\exllamav2\model.py", line 248, in load
for item in f: return item
File "C:\exllamav2-0.0.12\exllamav2\model.py", line 276, in load_gen
cleanup_stfiles()
File "C:\exllamav2-0.0.12\exllamav2\fasttensors.py", line 30, in cleanup_stfiles
ext_c.safetensors_free_pinned_buffer()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'exllamav2_ext' has no attribute 'safetensors_free_pinned_buffer'
The way I quantized is the same for both versions.
OS:WIN10 CPU:Ryzen 7 5700G GPU:RTX4070 12GB RAM :32GB