Try fix gemma-2 (not working)

Investigate llama.cpp code first as the other archs are working.

try gemma 1
try merge (work for others?)
check each layer has lora applied (check names etc)
run base prompt for lora in llamacpp (can see the base but not perfect)
Check lora names and how they match the base layer names in llama.cpp

Feels like one issue it that lm_head is skipped when converting. This line in convert_hf_to_gguf:

    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
        del bid  # unused

        # lm_head is not used in llama.cpp, while autoawq will include this tensor in model
        # To prevent errors, skip loading lm_head.weight.
        if name == "lm_head.weight":
            logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.")
            return []

        # ref: https://github.com/huggingface/transformers/blob/fc37f38915372c15992b540dfcbbe00a916d4fc6/src/transformers/models/gemma/modeling_gemma.py#L89
        if name.endswith("norm.weight"):
            data_torch = data_torch + 1

        return [(self.map_tensor_name(name), data_torch)]

Questions:

is it skipped even when converting the base model?
how is this handled correctly for Phi3?
Why is gemma2 base not doing correct inference now? Close this and reverse changes to same as before?

Feels like the last weight is not converted for gemma? (As Phi3 has output.weight along with output_norm.weight.)

ltoniazzi / reduce-llms-for-testing

Try fix gemma-2 (not working) #5