huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

KeyError: 'lm_head.weight_qtype' when loading the quanto model #213

Closed RanchiZhao closed 1 week ago

RanchiZhao commented 2 weeks ago

Hey there! I was trying to load a quanto model using model.load_state_dict(safe_load(model_location)), but:

Traceback (most recent call last):
File "/local/apps/quanto/examples/run_saved_quant.py", line 23, in <module>
Loading state dict
model.load_state_dict(safe_load(model_location))
File "/home/jeeves/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2175, in load_state_dict
load(self, state_dict)
File "/home/jeeves/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2163, in load
load(child, child_state_dict, child_prefix) # noqa: F821
File "/home/jeeves/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2157, in load
module._load_from_state_dict(
File "/home/jeeves/.local/lib/python3.10/site-packages/optimum/quanto/nn/qmodule.py", line 157, in _load_from_state_dict
weight_qtype = state_dict.pop(prefix + "weight_qtype")
KeyError: 'lm_head.weight_qtype'

When using quanto for model quantization, I notice that the lm_head and embedding aren't being quantized. It seems like this aspect wasn't considered when loading the model.

dacorvo commented 2 weeks ago

When reloading a state_dict, the list of quantized modules must match between the target model and the state_dict. Can you share a larger code snippet ?

RanchiZhao commented 2 weeks ago

By the way, it looks like the _save_to_state_dict and _load_from_state_dict functions support handling weight_qtype as none during save and load operations. However, I seem to run into some issues when using safe_save to store the model.

RanchiZhao commented 2 weeks ago

When reloading a state_dict, the list of quantized modules must match between the target model and the state_dict. Can you share a larger code snippet ?

I'm currently only using the code from this link: https://github.com/huggingface/optimum-quanto/issues/136#issuecomment-2049419065.

Additionally, if my model has the lm_head and embeddings sharing weights, I encounter issues with safe_save. I must perform a copy for the lm_head before saving, even though this increases the model size.

RanchiZhao commented 2 weeks ago

It seems that the issue arises because the lm_head must perform feedforward as a qlinear after I load the weights, even though it doesn't have qweights.

RanchiZhao commented 1 week ago

It seems that the issue arises because the lm_head must perform feedforward as a qlinear after I load the weights, even though it doesn't have qweights.

I think I've pinpointed the issue. During the requantization process, I'm passing in model instead of model.model. This causes the lm_head to get quantized, but there are no corresponding weights for it in the state_dict....