Open jacquesqiao opened 1 year ago
Same problem here. Maybe huggingface updated their pretrained model? Did you find a solution?
I encountered the same problem, I found the problem was caused by the account of parameters. Then I compared the parameters of sd and sd_hf. The problem seems to be caused by hugging face update GPT2Attention
source code
I add self.register_buffer("masked_bias", torch.tensor(-1e4), persistent=False)
in Model.py, then solve it!
Where did you add that line of code in model.py?
My fix was the following in model.py
.
# attn.bias isn't in the hugging face state dict, so we can't check for it
assert len(keys) == len([k for k in sd if not k.endswith('.attn.bias')])