likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
MIT License
461 stars 36 forks source link

Difference between tqag_gen_end_q and tqa_gen? #11

Closed jongjyh closed 1 year ago

jongjyh commented 1 year ago

Hi,

I find there are two formats to extract features from tqa, which is tqa_gen_end_q and tqa_gen and the front one being added to a random question.

why is that, and will it effect the performance?

bests,

likenneth commented 1 year ago

Hi, I'm cross posting my reply from here.

tqa_gen_end_q is a technical detail not covered in the paper. It's basically adding a question behind the QA pair as seen in tqa_mc2, which makes the feature distribution closer to real prompting scenarios. The only usage of tqa_gen_end_q is to estimated the standard deviation of the feature distribution (projected onto the found truthful direction), so the label and order do not matter as you asked in another issue. The results in the paper are gotten from running this repo with its default settings, which used tqa_gen_end_q to estimated standard deviation and tqa_mc2 to find truthful directions.

This isn't a main contribution of the paper so it's not covered in the prose. I can recall it sightly helped the performance in some initial experiments.