ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.
https://nnsight.net/
MIT License
356 stars 34 forks source link

In-place editor calls repeat when the modified model is run #210

Open mitroitskii opened 2 weeks ago

mitroitskii commented 2 weeks ago

For model.edit context, when inplace=True and return_context=True, all method calls of the context object (editor) repeat when the modified model is traced. The only editor method I am aware of is .log() so I tested this behavior on it.

It looks like each call of the model.edit context with editor.log inside it adds the editor.log call with the specified argument to the intervention graph. Then, when the model, that was edited in-place, is traced, it displays all the editor.log calls added inside the model.edit contexts.

Screenshot 2024-08-30 at 4 45 06 AM
AdamBelfki3 commented 2 weeks ago

Does this behavior not appear if you use nnsight.log()?

AdamBelfki3 commented 2 weeks ago

Isn't this expected behavior since you are the cell multiple times with in_place editing?

mitroitskii commented 2 weeks ago

Same thing happens with nnsight.log().

My understanding was that the in-place is supposed to only modify the modules of the model whereas right now it looks like it just attaches literally everything within the context to the intervention graph. Is it expected?

AdamBelfki3 commented 2 weeks ago

Yes, it is expected! It basically defines any default interventions to be ran before the newly define intervention graph in a certain context.

mitroitskii commented 2 weeks ago

This feels potentially tricky and error-prone as there are so many operations now that are patched in and are a part of the intervention graph - literally every built-in operation, all torch methods, all torch tensor creation operations and einsop functions.

How can we make it more obvious that the .edit() not only modifies the modules of the model but literally "remembers" everything the user specifies in the context?