mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
4.03k stars 526 forks source link

Converting checkpoints to HF post surgery Algos #1492

Closed Extirpater closed 2 months ago

Extirpater commented 2 months ago

Trying to pretrain the Replit1_5v model https://huggingface.co/replit/replit-code-v1_5-3b which uses prefix LM. I am able to load and train the model but am having trouble converting the Trainer checkpoints to HF. I'm getting this error when I try to convert.

##############################
HF checkpoint folder successfully created at 2B-firstpass/HF/.
Loading model from 2B-firstpass/HF/
construction
<class 'llmfoundry.models.layers.norm.LPLayerNorm'>
construction
<class 'llmfoundry.models.layers.attention.GroupedQueryAttention'>
Traceback (most recent call last):
  File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 347, in <module>
    convert_composer_to_hf(parse_args())
  File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 338, in convert_composer_to_hf
    raise e
  File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 336, in convert_composer_to_hf
    _convert_composer_to_hf(args)
  File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 212, in _convert_composer_to_hf
    loaded_hf_model = MPTForCausalLM.from_pretrained(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/transformers/modeling_utils.py", line 3798, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 1040, in __init__
    self.transformer: MPTModel = self.backbone_model_class(config)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 414, in __init__
    self.blocks = self.construct_blocks(config=config,)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 506, in construct_blocks
    return nn.ModuleList([
                         ^
  File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 507, in <listcomp>
    self.block_class(
  File "/bit-replit/llmfoundry/models/layers/blocks.py", line 107, in __init__
    self.attn = build_attention_layer(
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/bit-replit/llmfoundry/models/layers/layer_builders.py", line 95, in build_attention_layer
    return construct_from_registry(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bit-replit/llmfoundry/utils/registry_utils.py", line 162, in construct_from_registry
    constructed_item = registered_constructor(**kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GroupedQueryAttention.__init__() got an unexpected keyword argument 'prefix_lm'

When looking at the convert_composer_to_hf.py file, it warns

note:: This function will not work properly if you used surgery algorithms when you trained your model. In that case you will want to load the model weights using the Composer :class:~composer.Trainer with the load_path argument.

Could you provide an example with this?

dakinggg commented 2 months ago

We removed support for prefixlm in llmfoundry, so you'll need to back to a commit that still supports it. I believe v0.6.0 should work