karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
MIT License
20.31k stars 2.53k forks source link

AssertionError when run generate.ipynb with default parameter #120

Open jacquesqiao opened 1 year ago

jacquesqiao commented 1 year ago
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[14], line 2
      1 if use_mingpt:
----> 2     model = GPT.from_pretrained(model_type)
      3 else:
      4     model = GPT2LMHeadModel.from_pretrained(model_type)

File ~/project/llm/minGPT/mingpt/model.py:200, in GPT.from_pretrained(cls, model_type)
    197 transposed = ['attn.c_attn.weight', 'attn.c_proj.weight', 'mlp.c_fc.weight', 'mlp.c_proj.weight']
    198 # basically the openai checkpoints use a "Conv1D" module, but we only want to use a vanilla nn.Linear.
    199 # this means that we have to transpose these weights when we import them
--> 200 assert len(keys) == len(sd)
    201 for k in keys:
    202     if any(k.endswith(w) for w in transposed):
    203         # special treatment for the Conv1D weights we need to transpose

AssertionError: 
hjwdzh commented 1 year ago

Same problem here. Maybe huggingface updated their pretrained model? Did you find a solution?

ydyjya commented 1 year ago

I encountered the same problem, I found the problem was caused by the account of parameters. Then I compared the parameters of sd and sd_hf. The problem seems to be caused by hugging face update GPT2Attentionsource code I add self.register_buffer("masked_bias", torch.tensor(-1e4), persistent=False) in Model.py, then solve it!

jasonwvasquez commented 1 year ago

Where did you add that line of code in model.py?

ToddMorrill commented 11 months ago

My fix was the following in model.py.

# attn.bias isn't in the hugging face state dict, so we can't check for it
assert len(keys) == len([k for k in sd if not k.endswith('.attn.bias')])