Currently, in the attribution and activation patching examples on nnsight, patching is done FROM clean TO corrupt. While, in standardized settings patching is done FROM corrupt TO clean. A better set of terms could also be base and source.
Will submit a PR for this soon if it seems ok, and someone doesn't beat me to it. :)
cc/ @JadenFiotto-Kaufman Can you please :+1: if this sounds reasonable?
Currently, in the attribution and activation patching examples on nnsight, patching is done FROM clean TO corrupt. While, in standardized settings patching is done FROM corrupt TO clean. A better set of terms could also be base and source.
Will submit a PR for this soon if it seems ok, and someone doesn't beat me to it. :)
cc/ @JadenFiotto-Kaufman Can you please :+1: if this sounds reasonable?