Open panuthept opened 2 months ago
Additionally, the current implementation of TransformerLensGenerativeLLM only support patching the activation of "mlp_post" and "mlp_pre". Feel free to modify TransformerLensGenerativeLLM to support patching the attention module.
Example script:
Vary the intervention_layers in range of 0-23 to find the best combination.
Feel free to modify the TransformerLensGenerativeLLM._generate() to speed thing up. The current implementation predict the whole sequence, we probably only need to predict the first token to calculate the indirect effect.