Missing Code for Extracting Activation Values?

F4biian commented 6 months ago

Hey there, I can't find the source code for creating the './helm/hd/{model_name}/hd_act.json' files which are part of the HELM dataset. Although you haven't used them at the end, I'd like to know how you did that (to reproduce it).

bebr2 commented 6 months ago

We implemented it by modifying the corresponding model's code in the transformers package.

For instance, for the Llama model, we modified line 240 in modeling_llama.py in the local file to:

a = self.act_fn(self.gate_proj(x))
self.activate_value_by_mind = a.clone().detach()
down_proj = self.down_proj(a * self.up_proj(x))

(You can browse modeling_llama.py here)

The code to obtain the activation value is as follows:

_ = model(torch.tensor(ids).to(model.device), output_hidden_states=True).hidden_states
act = model.model.layers[-1].mlp.activate_value_by_mind.clone().detach().squeeze()[-1].tolist()

I hope this resolves your issue.

F4biian commented 6 months ago

This works perfectly, thank you very much!

oneal2000 / MIND

Missing Code for Extracting Activation Values? #2