b_u.pt and W_U.pt file size mismatch

msakarvadia / AttentionLens

Interpretating the latent space representations of attention head outputs for LLMs

MIT License

25 stars 3 forks source link

Hello!

Sorry to hear you are having issues with the code.

As I don't have access to your code, I don't have enough information to know why you are getting that error. I assume you are resizing with a .T transpose? In our experience it works better to try using the biases and weights from Transformer Lenses GPT2-small, as HF b_U is None, and so transformer lens' values lead to a more stable convergence.

Here is a snippet to get the W_U.pt and b_U.pt we used:

`import torch from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained('gpt2-small')

W_U = model.W_U b_U = model.b_U

torch.save(W_U, 'W_U2.pt') torch.save(b_U, 'b_U2.pt')`

Let me know if this works for you!

msakarvadia / AttentionLens

b_u.pt and W_U.pt file size mismatch #51