likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
MIT License
436 stars 35 forks source link

inquery on equaltion of paper #3

Closed jongjyh closed 1 year ago

jongjyh commented 1 year ago

Hi,

excellent work! I'd like to ask the intuition of equation of ITI (i.e equation(2) on paper). Does this have any connection with PPLM where updates the hidden state accoding to its score( gradient of log likelihood)? why it uses $\sigma*\theta$ to update? I couldn't find explicit explaination on paper.

Also, does ITI need iterative updation like PPLM?

Thanks in advance. :)

likenneth commented 1 year ago

Hi,

Both PPLM and ITI are working on the innate representation space. However, PPLM gets its gradient from SGD on the text space with a text classifier; and ITI has a learnt and fixed direction $\theta$ for this.

$\theta$ is normalized to be in a unit sphere, therefore we use $\alpha$ to calibrate the strength of such interventions.

Unlike PPLM, ITI is one-pass update.