jettjaniak / teren

Linking activation space features to model behavior
Apache License 2.0
0 stars 1 forks source link

Add the functionality to specify in each perturbation if we perturb from activation or towards another point #20

Open GiglemaAI opened 4 months ago

GiglemaAI commented 4 months ago

Perturbations where we perform -resid_acts vs when we don't should be distinguished and both be accessible by specifying a parameter (for example).

jettjaniak commented 4 months ago

Current perturbations always return a direction, not a point toward which you move. Should we just be specific in the docstring / name if that direction points toward a special point?