likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
MIT License
461 stars 36 forks source link

inquery of visualizing result on the paper #4

Closed jongjyh closed 1 year ago

jongjyh commented 1 year ago

Hi! Sorry to bother again. :)

visualization

I'm little confused about the process of visualizing. Here is my view: you project every data points in dataset into two orthogonal truthful directions (by product i guess?), and visualize this two distribution of product result.

If so, why are the means of positive samples along both directions less than 0 in the visualization results of the paper, while the means of negative samples are greater than 0? This doesn't seem to make sense.

image

reproduce prompt learning experiment

Also, could you provide the few-shot prompt samples you use? i tried to run --use_prefix to reproduce but got the KeyError and I checked the code:

        if args.use_prefix: 
            preset = 'few_shot'

it looks like TruthfulQA repo isn't ready for this setting.

I would really appreciate it if you could help me clarify my confusion.

likenneth commented 1 year ago

Hi, I've removed the few-shot option to avoid any confusion. I'm looking into the figure and will update the Arxiv with the correct version.

jongjyh commented 1 year ago

Hi, I've removed the few-shot option to avoid any confusion. I'm looking into the figure and will update the Arxiv with the correct version.

hi, glad to hear from you.

There is another problem I would like to discuss which refer to the table 1 and table 2.

table 2 reports the test set result of 2-fold(100% data) and talbe 1 reports the validation result( training set and validation are both 5% of TruthfulQA)

Have I got that right?

Thanks!

likenneth commented 1 year ago

That's right!