Closed chessgecko closed 3 years ago
v2 is slightly different from v1. The latest v2 code hasn't been integrated with HF transformers yet. To try our v2 model, you need to use our official package at this moment. We will integrate our latest code with HF transformers soon.
Really sorry to be annoying about this, but I couldn't quite get it to line up in the code currently on this repo either,
encoder.layer.0.attention.self.q_bias: torch.Size([1536])
encoder.layer.0.attention.self.v_bias: torch.Size([1536])
encoder.layer.0.attention.self.in_proj.weight: torch.Size([4608, 1536])
encoder.layer.0.attention.self.pos_proj.weight: torch.Size([1536, 1536])
encoder.layer.0.attention.self.pos_q_proj.weight: torch.Size([1536, 1536])
encoder.layer.0.attention.self.pos_q_proj.bias: torch.Size([1536])
vs
deberta.encoder.layer.0.attention.self.query_proj.weight: torch.Size([1536, 1536])
deberta.encoder.layer.0.attention.self.query_proj.bias: torch.Size([1536])
deberta.encoder.layer.0.attention.self.key_proj.weight: torch.Size([1536, 1536])
deberta.encoder.layer.0.attention.self.key_proj.bias: torch.Size([1536])
deberta.encoder.layer.0.attention.self.value_proj.weight: torch.Size([1536, 1536])
deberta.encoder.layer.0.attention.self.value_proj.bias: torch.Size([1536])
It seems like they should match, but I wasn't quite sure what went where.
Also, is the code to run the model all the same?
This is with the weights at https://huggingface.co/microsoft/deberta-xxlarge-v2/tree/main
The code is different. Please check https://github.com/microsoft/DeBERTa/blob/penhe/debertav2/DeBERTa/deberta/disentangled_attention.py for the differences.
Working for me now, thanks!
Hello,
It seems like some of the weights were renamed/shaped in the V2 model releases and I couldn't quite figure out how to map them to the old structure
but I couldn't match
That was for huggingface, but I couldn't figure it out in this repo either.
Could someone upload the v2 model file?