Open linlong1314 opened 1 year ago
This is because of the custom implementation of multi-head attention. The CausalSelfAttention
module registers a buffer to ensure that attention is only applied to tokens on the left of the input sequence. However, state_dict()
returns buffers as part of the model's state. This is implemented differently in Pytorch's native module M̀ultiheadAttention and the mask is not part of the model's state dict. That's why the assertion fails.
You can fix that by adding the flag persistent=False
when the buffer is registered in the __init__
function of the CausalSelfAttention
module.
How can I run a trained model? Include/ Projects/add/model. pt. Test Hugging face Import. py directly runs this test program and reports File ".\minGPT\master\mingpt model. py", line 202, in from_ Pre trained Assert len (keys)==len (sd) (act): NewGELUActivation(