turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.22k stars 238 forks source link

TypeError: unhashable type: 'slice' when converting and quantizing #287

Closed tm17-abcgen closed 5 months ago

tm17-abcgen commented 5 months ago

Happens when i am using

python convert.py -i ./models/DiscoLM_German_7b_v1/ -cf ./models/DiscoLM_German_7b_v1-exl2/4.0bpw/ -b 4.0 -o ./temp/exl2

with this model: https://huggingface.co/DiscoResearch/DiscoLM_German_7b_v1

How i set up the repository:

Download the latest (0.0.11) release of exllamav2. Go to repo, create venv, activate it, then

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Afterwards the convert started, then i got the following error (Retrying after error for first time, however same error message):

-- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: ./models/DiscoLM_German_7b_v1/
 -- Output: ./temp/exl2
 -- Using default calibration dataset
 -- Target bits per weight: 4.0 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- RoPE scale: 1.00
 -- RoPE alpha: 1.00
 -- Full model will be compiled to: ./models/DiscoLM_German_7b_v1-exl2/4.0bpw/ 
 -- Quantizing...
 -- Layer: model.layers.30 (Attention)
 -- Linear: model.layers.30.self_attn.q_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.30.self_attn.k_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.30.self_attn.v_proj -> 1:5b_32g s4, 5.13 bpw
 -- Linear: model.layers.30.self_attn.o_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Module quantized, rfn_error: 0.012517
 -- Layer: model.layers.30 (MLP)
 -- Linear: model.layers.30.mlp.gate_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.30.mlp.up_proj -> 0.25:5b_32g/0.75:4b_32g s4, 4.38 bpw
 -- Linear: model.layers.30.mlp.down_proj -> 0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4, 4.43 bpw
 -- Module quantized, rfn_error: 0.029107
 -- Layer: model.layers.31 (Attention)
 -- Linear: model.layers.31.self_attn.q_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.31.self_attn.k_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.31.self_attn.v_proj -> 1:5b_32g s4, 5.13 bpw
 -- Linear: model.layers.31.self_attn.o_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Module quantized, rfn_error: 0.013678
 -- Layer: model.layers.31 (MLP)
 -- Linear: model.layers.31.mlp.gate_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw
 -- Linear: model.layers.31.mlp.up_proj -> 0.25:5b_32g/0.75:4b_32g s4, 4.38 bpw
 -- Linear: model.layers.31.mlp.down_proj -> 0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4, 4.43 bpw
 -- Module quantized, rfn_error: 0.027660
 -- Layer: model.norm (RMSNorm)
 -- Module quantized, rfn_error: 0.000000
 -- Layer: lm_head (Linear)
 -- Linear: lm_head -> 0.15:8b_128g/0.85:6b_128g s4, 6.34 bpw
Traceback (most recent call last):
  File "E:\MLDLProjects\exllamav2-0.0.11_v2\convert.py", line 250, in <module> 
    quant(job, save_job, model)
  File "E:\MLDLProjects\exllamav2-0.0.11_v2\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\MLDLProjects\exllamav2-0.0.11_v2\conversion\quantize.py", line 367, in quant
    if module.padding > 0: outputs = outputs[:, :, :-module.padding]
TypeError: unhashable type: 'slice'

Adding the following code for debugging in convert.py, i got the following hint:

print(f"Type of module.padding: {type(module.padding)}") 
if not isinstance(module.padding, int):
    raise TypeError(f"Expected module.padding to be an int, got {type(module.padding)} instead")

if module.padding > 0: outputs = outputs[:, :, :-module.padding]
Traceback (most recent call last):
  File "E:\MLDLProjects\exllamav2-0.0.11\convert.py", line 250, in <module>
    quant(job, save_job, model)
  File "E:\MLDLProjects\exllamav2-0.0.11\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\MLDLProjects\exllamav2-0.0.11\conversion\quantize.py", line 371, in quant
    raise TypeError(f"Expected outputs to be a torch.Tensor, got {type(outputs)} instead")
TypeError: Expected outputs to be a torch.Tensor, got <class 'dict'> instead

How do i solve this? Thanks in advance.

turboderp commented 5 months ago

I can't seem to reproduce this, even with a clean venv and installing torch and requirements.txt as you suggest.

Are you also installing the 0.0.11 prebuilt wheel to use along the dev version in the repo?

tm17-abcgen commented 5 months ago

Pulling the dev version:

git clone https://github.com/turboderp/exllamav2
cd exllamav2
create venv and activate
pip install correct torch version
pip install -r requirements.txt
python setup.py install --user

solved the issue thank you