def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
del bid # unused
# lm_head is not used in llama.cpp, while autoawq will include this tensor in model
# To prevent errors, skip loading lm_head.weight.
if name == "lm_head.weight":
logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.")
return []
# ref: https://github.com/huggingface/transformers/blob/fc37f38915372c15992b540dfcbbe00a916d4fc6/src/transformers/models/gemma/modeling_gemma.py#L89
if name.endswith("norm.weight"):
data_torch = data_torch + 1
return [(self.map_tensor_name(name), data_torch)]
Questions:
is it skipped even when converting the base model?
how is this handled correctly for Phi3?
Why is gemma2 base not doing correct inference now? Close this and reverse changes to same as before?
Feels like the last weight is not converted for gemma? (As Phi3 has output.weight along with output_norm.weight.)
Investigate llama.cpp code first as the other archs are working.
Feels like one issue it that
lm_head
is skipped when converting. This line inconvert_hf_to_gguf
:Questions:
Feels like the last weight is not converted for gemma? (As Phi3 has
output.weight
along withoutput_norm.weight
.)