Open amnonbleich opened 1 year ago
The root cause of the problem is that persistent=False is set for the attn.bias keys in the original Hugging Face code (_https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py_, line 133). It means that these keys are not included in the state dictionary at HF while they still are in minGPT. That's while the assetion fails in line 200 of minGPT/model.py.
So a better solution is to also set the same persistent=False option for the attn.bias keys in line 48 of minGPT/model.py, like this: _self.register_buffer("bias", torch.tril(torch.ones(config.block_size, config.block_size)) .view(1, 1, config.block_size, config.blocksize), persistent=False)
Also, the _attn.maskedbias keys get the same persistent=False option in the HF code, hence they aren't included in the HF state dictionary. So excluding them in line 196 of minGPT/model.py is unnecesary. And consequently we don't need the keys variable at all, we can directly use _sdhf instead everywhere.
bug fix - remove attn.bias keys from GPT state dict in 'from_pretrined'. otherwise assertion fails. if that's not a bug, would be happy to hear what is the reasoning. in addition, the above mentioned keys are not used elsewhere, only in the assertion