ltoniazzi / reduce-llms-for-testing

Reduce LLMs size for testing
0 stars 0 forks source link

Try fix gemma-2 (not working) #5

Closed ltoniazzi closed 3 months ago

ltoniazzi commented 3 months ago

Investigate llama.cpp code first as the other archs are working.

Feels like one issue it that lm_head is skipped when converting. This line in convert_hf_to_gguf:

    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
        del bid  # unused

        # lm_head is not used in llama.cpp, while autoawq will include this tensor in model
        # To prevent errors, skip loading lm_head.weight.
        if name == "lm_head.weight":
            logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.")
            return []

        # ref: https://github.com/huggingface/transformers/blob/fc37f38915372c15992b540dfcbbe00a916d4fc6/src/transformers/models/gemma/modeling_gemma.py#L89
        if name.endswith("norm.weight"):
            data_torch = data_torch + 1

        return [(self.map_tensor_name(name), data_torch)]

Questions:

Feels like the last weight is not converted for gemma? (As Phi3 has output.weight along with output_norm.weight.) image

ltoniazzi commented 3 months ago

Note: could it be related to this issue? https://github.com/unslothai/unsloth/issues/869 As I had problems with merge/unload