ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.

How can I fix this issue? following given is the script :

!python finetune.py "/content/data.json" \
    --ds_type=alpaca \
    --lora_out_dir=./test/ \
    --llama_q4_config_dir="/content/text-generation-webui/models/wcde_llama-7b-4bit-gr128/config.json" \
    --llama_q4_model="/content/text-generation-webui/models/wcde_llama-7b-4bit-gr128/llama-7b-4bit-gr128.pt" \
    --mbatch_size=1 \
    --batch_size=4 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=128 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --groupsize=128 \
    --v1 \
    --xformers \
    --backend=cuda

and the corresponding log :

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/alpaca_lora_4bit/finetune.py:65 in <module>                         │
│                                                                              │
│    62 │   raise Exception('batch_size need to be larger than mbatch_size.')  │
│    63                                                                        │
│    64 # Load Basic Model                                                     │
│ ❱  65 model, tokenizer = load_llama_model_4bit_low_ram(ft_config.llama_q4_co │
│    66 │   │   │   │   │   │   │   │   │   │   │   │     ft_config.llama_q4_m │
│    67 │   │   │   │   │   │   │   │   │   │   │   │     device_map=ft_config │
│    68 │   │   │   │   │   │   │   │   │   │   │   │     groupsize=ft_config. │
│                                                                              │
│ /content/alpaca_lora_4bit/autograd_4bit.py:204 in                            │
│ load_llama_model_4bit_low_ram                                                │
│                                                                              │
│   201 │   │   │   if name in layers:                                         │
│   202 │   │   │   │   del layers[name]                                       │
│   203 │   │   make_quant_for_4bit_autograd(model, layers, groupsize=groupsiz │
│ ❱ 204 │   model = accelerate.load_checkpoint_and_dispatch(                   │
│   205 │   │   model=model,                                                   │
│   206 │   │   checkpoint=model_path,                                         │
│   207 │   │   device_map=device_map,                                         │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py:479 in    │
│ load_checkpoint_and_dispatch                                                 │
│                                                                              │
│   476 │   │   )                                                              │
│   477 │   if offload_state_dict is None and device_map is not None and "disk │
│   478 │   │   offload_state_dict = True                                      │
│ ❱ 479 │   load_checkpoint_in_model(                                          │
│   480 │   │   model,                                                         │
│   481 │   │   checkpoint,                                                    │
│   482 │   │   device_map=device_map,                                         │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py:993 in  │
│ load_checkpoint_in_model                                                     │
│                                                                              │
│    990 │   │   │   │   │   set_module_tensor_to_device(model, param_name, "m │
│    991 │   │   │   │   │   offload_weight(param, param_name, state_dict_fold │
│    992 │   │   │   │   else:                                                 │
│ ❱  993 │   │   │   │   │   set_module_tensor_to_device(model, param_name, pa │
│    994 │   │                                                                 │
│    995 │   │   # Force Python to clean up.                                   │
│    996 │   │   del checkpoint                                                │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py:135 in  │
│ set_module_tensor_to_device                                                  │
│                                                                              │
│    132 │   │   tensor_name = splits[-1]                                      │
│    133 │                                                                     │
│    134 │   if tensor_name not in module._parameters and tensor_name not in m │
│ ❱  135 │   │   raise ValueError(f"{module} does not have a parameter or a bu │
│    136 │   is_buffer = tensor_name in module._buffers                        │
│    137 │   old_value = getattr(module, tensor_name)                          │
│    138                                                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer 
named qzeros.

I'm doing it in colab, I hope that doesn't cause any issues.

johnsmith0031 / alpaca_lora_4bit

ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros. #105