run_server.sh: ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx.

How to fix the following problem?

"ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx."

script

❯ cat ./run_server.sh
#!/bin/bash

export PYTHONPATH=$PYTHONPATH:./:./text-generation-webui

CONFIG_PATH=./llama-7b-4bit-v2/config.json
MODEL_PATH=./llama-7b-4bit-v2/llama-7b-4bit-ts-ao-g128-v2.safetensors
LORA_PATH=./alpaca_lora/adapter_model.bin

#VENV_PATH=
#source $VENV_PATH/bin/activate
python ./scripts/run_server.py --config_path $CONFIG_PATH --model_path $MODEL_PATH --lora_path $LORA_PATH --groupsize=128 --quant_attn --port 5555 --pub_port 5556

log

❯ ./run_server.sh
Loading ./llama-7b-4bit-v2/llama-7b-4bit-ts-ao-g128-v2.safetensors ...
Loading Model ...
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
The safetensors archive passed at ./llama-7b-4bit-v2/llama-7b-4bit-ts-ao-g128-v2.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py:820: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt", device=device) as f:
Traceback (most recent call last):
  File "/home/nyculiao/liao/alpaca_lora_4bit/./scripts/run_server.py", line 26, in <module>
    server.run()
  File "/home/nyculiao/liao/alpaca_lora_4bit/model_server/server.py", line 147, in run
    self.load_model()
  File "/home/nyculiao/liao/alpaca_lora_4bit/model_server/server.py", line 79, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(self.config_path, self.model_path, groupsize=self.groupsize, is_v1_model=self.is_v1_model)
  File "/home/nyculiao/liao/alpaca_lora_4bit/autograd_4bit.py", line 204, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx.

johnsmith0031 / alpaca_lora_4bit

run_server.sh: ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx. #98

script

log