huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.34k stars 26.35k forks source link

Mergekit support for GPT2 error #29184

Closed NamburiSrinath closed 5 months ago

NamburiSrinath commented 7 months ago

Feature request

Hi,

I am working on model merging using mergekit (https://huggingface.co/blog/mlabonne/merge-models) and tried to merge finetuned GPT2. But I was getting the following errors -

SLERP:

slices:
  - sources:
      - model: /hdd4/srinath2/Trading_Agent/model_merging/model1_training/checkpoint-3750
        layer_range: [0, 12]
      - model: /hdd4/srinath2/Trading_Agent/model_merging/model2_training/checkpoint-3750
        layer_range: [0, 12]
merge_method: slerp
base_model: openai-community/gpt2
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Error - RuntimeError: Tensor ln_f.bias required but not present in model /hdd4/srinath2/Trading_Agent/model_merging/model2_training/checkpoint-3750

TIES

models:
  - model: openai-community/gpt2
    # no parameters necessary for base model
  - model: /hdd4/srinath2/Trading_Agent/model_merging/model1_training/checkpoint-3750
    parameters:
      density: 0.5
      weight: 0.5
  - model: /hdd4/srinath2/Trading_Agent/model_merging/model2_training/checkpoint-3750
    parameters:
      density: 0.5
      weight: 0.3
merge_method: ties
base_model: openai-community/gpt2
parameters:
  normalize: true
dtype: float16

Error - RuntimeError: Tensor score.weight required but not present in model openai-community/gpt2 Same error for DARE

Note: The moment I change the base model to one of the finetuned models, I am able to merge the finetuned GPT2 models. I am not sure if this is expected because the blogpost has a base model that's different from the other 2 models

Motivation

I finetuned GPT2 on 2 tasks (model1 and model2) and wanted to merge them to understand if the merged model is better than base model on these tasks

Your contribution

If there's any inconsistencies, I can contribute to blog editing/PR if needed!

amyeroberts commented 6 months ago

Hi @NamburiSrinath, thanks for raising an issue.

It's hard to tell from the information here - but this looks like an error relating to the mergekit library rather than anything to do with transformers.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.