Open PranavB007 opened 3 weeks ago
Hello!
Sorry to hear you are having issues with the code.
As I don't have access to your code, I don't have enough information to know why you are getting that error. I assume you are resizing with a .T transpose? In our experience it works better to try using the biases and weights from Transformer Lenses GPT2-small, as HF b_U is None, and so transformer lens' values lead to a more stable convergence.
Here is a snippet to get the W_U.pt and b_U.pt we used:
`import torch from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained('gpt2-small')
W_U = model.W_U b_U = model.b_U
torch.save(W_U, 'W_U2.pt') torch.save(b_U, 'b_U2.pt')`
Let me know if this works for you!
I have loaded the biases and weights for the GPT-2 model using
AutoModelForCausalLM
, but their sizes do not match the expected dimensions. The error I encountered is:When I try reshaping it to
[768, 50257]
, it throws an error: