Closed ChangWenhan closed 6 months ago
Hey,
Thank you for your interest! Unfortunately, I was sick, on holidays and now preparing the camera ready version of the paper (: Next month (hopefully), the whole library should be finally published alongside the paper!
To extract latent relevances, you can simply use a backward hook in PyTorch. For instance, in a Phi model we have three layers in a FFN: 1. linear, 2. activation function (knowledge neurons), 3. linear
If you want to get the relevance at the knowledge neurons, you must save the gradients at the output of (2).
def append_backward_hooks_to_mlp(model):
rel_layer = {}
def generate_hook(layer_name):
def backward_hook(module, input_grad, output_grad):
# cloning the relevance makes sure, it is not modified through memory-optimized LXT inplace operations if used
rel_layer[layer_name] = output_grad.clone()
return backward_hook
# append hook to last activation of mlp layer
for name, layer in model.model.named_modules():
if name.endswith("mlp"):
layer.activation_fn.register_full_backward_hook(generate_hook(name))
return rel_layer
Hope it helps, Reduan
Thank you for your response! And wishing you a speedy recovery! We will continue to follow your works and the LXT library, as they have been of great assistance to us.
Wishing you all the best, Wenhan
Hello! The code you’ve published runs smoothly, but how to extract the relevance of neurons in different FFN layers of a large language model using the code in this repository?
What’s more, we are looking forward to the parts of Latent Feature Visualization and Perturbation Evaluation
Thank you!