ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.
https://nnsight.net/
MIT License
399 stars 37 forks source link

Is it possible to deal with the activations before exiting the tracing context? #281

Open Shiroha-Offical opened 1 week ago

Shiroha-Offical commented 1 week ago

I'm trying to intervene the modules in .generate call with some conditions by self-defined prober. However, the conditions are calculated by the activations before exiting the tracing context. Therefore, I couldn't do anything without the value of the condition features.

Here are my desired behavior with demonstration in code and error during attempts:

with llm.generate(prompt_template(query), max_new_tokens=max_lenth, top_k=1) as tracer:
       all_tokens = llm.generator.output.detach().cpu().save()
       inputs = llm.lm_head.input.save()
       for count in range(intervene_steps):
            for layer in range(num_layers):
                current_head_wise_activations = llm.model.layers[layer].next().self_attn.o_proj.output.detach().cpu().save()
                # Non-linear MLP prober
                probe_proba = trained_probe(current_head_wise_activations)
                if probe_proba < threshold:
                     # do func

I manage to calculate the probe_proba with nnsight.apply, but I could not get the probe_proba to distinguish whether to do func

  File "XXX/test.py", line 108, in intervention
    if current_probe < threshold:
  File "XXX/lib/python3.10/site-packages/nnsight/tracing/Proxy.py", line 258, in __bool__
    return self.node.proxy_value.__bool__()
AttributeError: type object '_empty' has no attribute '__bool__'. Did you mean: '__doc__'?

I will be grateful for any help you can provide.

JadenFiotto-Kaufman commented 1 day ago

@Shiroha-Offical Next version of nnsight will allow you to use conditionals (if statements in general) ! If you want to give it a try now you can pip install --upgrade git+https://github.com/ndif-team/nnsight.git@0.4 . Id love to hear back your experience.