generate is correct but generate from quantization get error:

I had trained 8 epochs and I got the last .pt file finally

refer to this documentation: Llama3 in torchtune

I have succeded to Evaluating and generate my fine-tuned Llama3-8B models.

but I have met 2 issues here:

1) I have tried to use prompt with simple question in the config file with instruct_template is null, it works.

but when I tried to use the instruct_template:

prompt: 
      instruction: "xxxxxxxx"
      input: "xxxxxx"
instruct_template: torchtune.custom.apqp.fmea.FMEAInstructTemplate

as same format as when I train the sample data by the FMEAInstructTemplate

it got error: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/agiuser/workspace/torchtune/torchtune/config/_utils.py", line 104, in _get_component_from_path raise InstantiationError( torchtune.config._errors.InstantiationError: Error loading 'torchtune.custom.apqp.fmea.FMEAInstructTemplate': ImportError("cannot import name 'Tokenizer' from 'torchtune.modules.tokenizers' (/home/agiuser/workspace/torchtune/torchtune/modules/tokenizers/init.py)")

2) when I tried to use quantization to test faster generation, I have got the meta_model_2-4w.pt which fize size is 4.6G .

-rw-rw-r-- 1 agiuser agiuser 6.6M Jul 7 19:03 adapter_2.pt -rw-rw-r-- 1 agiuser agiuser 12K Jul 7 18:53 log_1720358633.txt -rw-rw-r-- 1 agiuser agiuser 4.6G Jul 8 13:15 meta_model_2-4w.pt -rw-rw-r-- 1 agiuser agiuser 15G Jul 7 19:03 meta_model_2.pt -rw-rw-r-- 1 agiuser agiuser 14M Jul 7 17:09 recipe_state.pt

But when I tried to run the command:

2024-07-08:13:29:44,912 INFO [_utils.py:33] Running InferenceRecipe with resolved config:

chat_format: null checkpointer: component: torchtune.utils.FullModelMetaCheckpointer checkpoint_dir: ./tuned_checkpoints/apqp/fmea/10_epochs checkpoint_files:

meta_model_2-4w.pth model_type: LLAMA3 output_dir: ./tuned_checkpoints/apqp/fmea/10_epochs device: cuda dtype: bf16 max_new_tokens: 768 model: component: torchtune.models.llama3.llama3_8b prompt: "xxxxxxxxxx" quantizer: null seed: 1234 temperature: 0.6 tokenizer: component: torchtune.models.llama3.llama3_tokenizer path: ./checkpoints/Meta-Llama-3-8B-Instruct/tokenizer.model top_k: 300

2024-07-08:13:29:45,062 DEBUG [seed.py:60] Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0 Traceback (most recent call last): File "/home/agiuser/workspace/torchtune/torchtune/models/convert_weights.py", line 54, in get_mapped_key new_key = mapping_dict[abstract_key]


KeyError: 'layers.{}.sa_norm.scale'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/agiuser/miniconda3/envs/torchtune/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/agiuser/workspace/torchtune/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/home/agiuser/workspace/torchtune/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/home/agiuser/workspace/torchtune/torchtune/_cli/run.py", line 179, in _run_cmd
    self._run_single_device(args)
  File "/home/agiuser/workspace/torchtune/torchtune/_cli/run.py", line 93, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 286, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/agiuser/workspace/torchtune/recipes/generate.py", line 203, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/agiuser/workspace/torchtune/torchtune/config/_parse.py", line 50, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "/home/agiuser/workspace/torchtune/recipes/generate.py", line 198, in main
    recipe.setup(cfg=cfg)
  File "/home/agiuser/workspace/torchtune/recipes/generate.py", line 48, in setup
    ckpt_dict = checkpointer.load_checkpoint()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agiuser/workspace/torchtune/torchtune/utils/_checkpointing/_checkpointer.py", line 673, in load_checkpoint
    state_dict[utils.MODEL_KEY] = convert_weights.meta_to_tune(model_state_dict)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agiuser/workspace/torchtune/torchtune/models/convert_weights.py", line 85, in meta_to_tune
    new_key = get_mapped_key(key, _FROM_META)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agiuser/workspace/torchtune/torchtune/models/convert_weights.py", line 59, in get_mapped_key
    raise Exception(
Exception: Error converting the state dict. Found unexpected key: "layers.0.sa_norm.scale". Please make sure you're loading a checkpoint with the right format. 

Pls let me know any thing I can do  to fix the 2 issues.

BTW， I converted the tuned model directly to HuggingFace model . I use llama.cpp, run convert-hf-to-gguf.py to convert HF format to format gguf, which is success also use script ./llama-quantize to mesh the model to 4q gguf model file

I also succeeded to load the gguf model and 4q_gguf model into Ollama.

I tried to run the 2 models with Ollama both, which also be successful to get my result.

Hi @artisanclouddev sorry for the delayed response here. On (1): are you running on a pip installed version of torchtune or a git cloned one? (And if pip installed, are you on nightly or 0.1?) I ask because we made some changes to our tokenizers in #1082 and in the process Tokenizer was renamed to BaseTokenizer.

On (2): our quantization script puts models into torchtune format only, so I think you will need to change from your usage of FullModelMetaCheckpointer to FullModelTorchTuneCheckpointer. This is discussed a bit in this section of the Llama3 tutorial, you can see this code block specifically for running generation after quantization:

checkpointer:
  # we need to use the custom torchtune checkpointer
  # instead of the HF checkpointer for loading
  # quantized models
  _component_: torchtune.utils.FullModelTorchTuneCheckpointer

  # directory with the checkpoint files
  # this should match the output_dir specified during
  # fine-tuning
  checkpoint_dir: <checkpoint_dir>

  # checkpoint files point to the quantized model
  checkpoint_files: [
    consolidated-4w.pt,
  ]

  output_dir: <checkpoint_dir>
  model_type: LLAMA3

# we also need to update the quantizer to what was used during
# quantization
quantizer:
  _component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
  groupsize: 256

Hi @ebsmothers Thx for your reply

Hi @artisanclouddev sorry for the delayed response here. On (1): are you running on a pip installed version of torchtune or a git cloned one? (And if pip installed, are you on nightly or 0.1?) I ask because we made some changes to our tokenizers in #1082 and in the process Tokenizer was renamed to BaseTokenizer.

I was using the git clone and install it with

pip install -e .

On (2): our quantization script puts models into torchtune format only, so I think you will need to change from your usage of FullModelMetaCheckpointer to FullModelTorchTuneCheckpointer. This is discussed a bit in this section of the Llama3 tutorial, you can see this code block specifically for running generation after quantization:

I have tried to use FullModelTorchTuneCheckpointer, it works, thx!

checkpointer:
  # we need to use the custom torchtune checkpointer
  # instead of the HF checkpointer for loading
  # quantized models
  _component_: torchtune.utils.FullModelTorchTuneCheckpointer

  # directory with the checkpoint files
  # this should match the output_dir specified during
  # fine-tuning
  checkpoint_dir: <checkpoint_dir>

  # checkpoint files point to the quantized model
  checkpoint_files: [
    consolidated-4w.pt,
  ]

  output_dir: <checkpoint_dir>
  model_type: LLAMA3

# we also need to update the quantizer to what was used during
# quantization
quantizer:
  _component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
  groupsize: 256

Glad to hear (2) is resolved. For (1), can you confirm the content of torchtune/modules/tokenizers/__init__.py in your local install? In this case the best thing to do may just be git pull and update your instruct template correspondingly. If you're still stuck and willing to share the custom instruct template I can take a look at that and let you know if anything looks amiss.

Glad to hear (2) is resolved. For (1), can you confirm the content of torchtune/modules/tokenizers/__init__.py in your local install? In this case the best thing to do may just be git pull and update your instruct template correspondingly. If you're still stuck and willing to share the custom instruct template I can take a look at that and let you know if anything looks amiss.

Here is the content I used:

# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from ._sentencepiece import SentencePieceBaseTokenizer
from ._tiktoken import TikTokenBaseTokenizer
from ._utils import (
    BaseTokenizer,
    ModelTokenizer,
    parse_hf_tokenizer_json,
    tokenize_messages_no_special_tokens,
)

__all__ = [
    "SentencePieceBaseTokenizer",
    "TikTokenBaseTokenizer",
    "ModelTokenizer",
    "BaseTokenizer",
    "tokenize_messages_no_special_tokens",
    "parse_hf_tokenizer_json",
]

I have also pull after your mention to pull again, same content as above.

Here is my custom instruct template:

from torchtune.data import InstructTemplate

from typing import Any, Dict, Mapping, Optional

class FMEAInstructTemplate(InstructTemplate):
    """
    Prompt template for FMEA dataset

    .. code-block:: text

        Below is an instruction that describes a task, paired with an input that provides further context.
        Write a response that appropriately completes the request.

        ### Instruction:
        <YOUR INSTRUCTION HERE>

        ### Input:
        <YOUR INPUT HERE>

        ### Response:

    """

    template = {
        "prompt_input": (
            "Below is an instruction that describes a task, paired with an input that provides further context. "
            "Write a response that appropriately completes the request.\n\n"
            "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
        ),
    }

    @classmethod
    def format(
            cls, sample: Mapping[str, Any], column_map: Optional[Dict[str, str]] = None
    ) -> str:
        """
        Generate prompt from instruction and input.

        Args:
            sample (Mapping[str, Any]): a single data sample with instruction
            column_map (Optional[Dict[str, str]]): a mapping from the expected placeholder names
                in the template to the column names in the sample. If None, assume these are identical.

        Examples:
            >>> # Simple instruction
            >>> AlpacaInstructTemplate.format(sample={"instruction": "Write a poem"})
            Below is an instruction that describes a task, paired with an input that provides further context.
            Write a response that appropriately completes the request.\\n\\n### Instruction:\\nWrite a poem\\n\\n### Response:\\n

        Returns:
            The formatted prompt
        """
        column_map = column_map or {}
        key_input = column_map.get("input", "input")
        # key_instruction = column_map.get("instruction", "instruction")
        instruction = """

               xxxxxxxx  output format xxxxxxxx

                """
        prompt = cls.template["prompt_input"].format(
            # instruction=sample[key_instruction],
            instruction=instruction,
            input=sample[key_input]
        )

        return prompt

pytorch / torchtune

generate is correct but generate from quantization get error: #1148