❯ ./run_server.sh
Loading ./llama-7b-4bit-v2/llama-7b-4bit-ts-ao-g128-v2.safetensors ...
Loading Model ...
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(checkpoint_file, framework="pt") as f:
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)
The safetensors archive passed at ./llama-7b-4bit-v2/llama-7b-4bit-ts-ao-g128-v2.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py:820: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(checkpoint_file, framework="pt", device=device) as f:
Traceback (most recent call last):
File "/home/nyculiao/liao/alpaca_lora_4bit/./scripts/run_server.py", line 26, in <module>
server.run()
File "/home/nyculiao/liao/alpaca_lora_4bit/model_server/server.py", line 147, in run
self.load_model()
File "/home/nyculiao/liao/alpaca_lora_4bit/model_server/server.py", line 79, in load_model
model, tokenizer = load_llama_model_4bit_low_ram(self.config_path, self.model_path, groupsize=self.groupsize, is_v1_model=self.is_v1_model)
File "/home/nyculiao/liao/alpaca_lora_4bit/autograd_4bit.py", line 204, in load_llama_model_4bit_low_ram
model = accelerate.load_checkpoint_and_dispatch(
File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
File "/home/nyculiao/anaconda3/envs/pytorch/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx.
How to fix the following problem?
"ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx."
script
log