Closed RanchiZhao closed 1 week ago
When reloading a state_dict, the list of quantized modules must match between the target model and the state_dict. Can you share a larger code snippet ?
By the way, it looks like the _save_to_state_dict
and _load_from_state_dict
functions support handling weight_qtype as none during save and load operations. However, I seem to run into some issues when using safe_save
to store the model.
When reloading a state_dict, the list of quantized modules must match between the target model and the state_dict. Can you share a larger code snippet ?
I'm currently only using the code from this link: https://github.com/huggingface/optimum-quanto/issues/136#issuecomment-2049419065.
Additionally, if my model has the lm_head and embeddings sharing weights, I encounter issues with safe_save. I must perform a copy for the lm_head before saving, even though this increases the model size.
It seems that the issue arises because the lm_head must perform feedforward as a qlinear after I load the weights, even though it doesn't have qweights.
It seems that the issue arises because the lm_head must perform feedforward as a qlinear after I load the weights, even though it doesn't have qweights.
I think I've pinpointed the issue. During the requantization process, I'm passing in model instead of model.model. This causes the lm_head to get quantized, but there are no corresponding weights for it in the state_dict....
Hey there! I was trying to load a quanto model using
model.load_state_dict(safe_load(model_location))
, but:When using quanto for model quantization, I notice that the lm_head and embedding aren't being quantized. It seems like this aspect wasn't considered when loading the model.