bug fix - remove attn.bias keys from GPT state dict in 'from_pretrine…

The root cause of the problem is that persistent=False is set for the attn.bias keys in the original Hugging Face code (_https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py_, line 133). It means that these keys are not included in the state dictionary at HF while they still are in minGPT. That's while the assetion fails in line 200 of minGPT/model.py.

So a better solution is to also set the same persistent=False option for the attn.bias keys in line 48 of minGPT/model.py, like this: _self.register_buffer("bias", torch.tril(torch.ones(config.block_size, config.block_size)) .view(1, 1, config.block_size, config.blocksize), persistent=False)

Also, the _attn.maskedbias keys get the same persistent=False option in the HF code, hence they aren't included in the HF state dictionary. So excluding them in line 196 of minGPT/model.py is unnecesary. And consequently we don't need the keys variable at all, we can directly use _sdhf instead everywhere.

karpathy / minGPT

bug fix - remove attn.bias keys from GPT state dict in 'from_pretrine… #122