Closed NieSYsc20 closed 4 months ago
Hello,
The activation vector is determined by the probing process. The alpha and sigma (both scalars) only control the strength of the intervention. Alpha is a hyper-parameter. Sigma is the standard deviation of the features along the truthful direction.
The intuition here is that the feature space is anisotropic, requiring different intervention strength at different directions. Controlling with sigma is a preliminary approach I employed to address this issue.
Hi, great work!
In the paper, ασθ is added after the Attn(·) output, and it is explained in the paper as: "This is equivalent to shifting activations along the truthful directions for α times the standard deviation."
I am confused about the insight behind this design. What is the specific meaning of "α times the standard deviation" in this paper? Why can the activation vector be calculated in this way? Could you please provide a more detailed explanation?
Thanks!