ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.
https://nnsight.net/
MIT License
400 stars 37 forks source link

Save Intervention Graph #136

Closed AdamBelfki3 closed 5 months ago

AdamBelfki3 commented 5 months ago

Adding functionality to save the Intervention Graph generated during a tracing context. This is particularly useful for debugging and acquiring a better understanding of how the intervention graph is formed.

To save the graph locally, you can now add a visualize argument to model.trace(...) by passing it a dictionary with the following kwargs:

"""
filename (str): Name of the Intervention Graph. Defaults to "graph".
format (str): Image format of the graphic. Defaults to "png".
directory (Optional[str]): Directory path to save the graphic in. If None saves content to the current directory.
"""

Here is a coding example for how to use the feature:

from collections import OrderedDict
from nnsight import NNsight
import torch

input_size = 5
hidden_dims = 10
output_size = 2

torch.manual_seed(423)

net = torch.nn.Sequential(
    OrderedDict(
        [
            ("layer1", torch.nn.Linear(input_size, hidden_dims)),
            ("layer2", torch.nn.Linear(hidden_dims, output_size))
        ]
    )
).requires_grad_(False)

input = torch.rand((1, input_size))

model = NNsight(net)

with model.trace(input, visualize={"filename": "Layer 1 Output Setting", "directory": "intervention-graphs"}):

    l1_output_before = model.layer1.output.clone().save()

    model.layer1.output[:, 2] = 0

    l1_output_after = model.layer1.output.save()

print("L1_Output_Before: ", l1_output_before)
print("L1_Output_After: ", l1_output_after)

with the following result being saved:

Layer 1 Output Attribute Setting