ryanshea10 / personachat_offline_rl

4 stars 1 forks source link

Problem about the result (metrics) #1

Open Fang-git0 opened 9 months ago

Fang-git0 commented 9 months ago

I tried to run your program and got the following results:

Attributes eval: Counter({'similar': 36, 'label': 30, 'rand': 12, 'neg': 10})

Haves eval: Counter({'similar': 34, 'label': 82, 'rand': 24', neg': 128,})

Likes eval: Counter({'similar': 94, 'label': 50, 'rand': 10, 'neg': 32, })

May I ask what is the corresponding relationship between 'similar, label, rand, neg' and the evaluation criteria in the paper. image

Thank you 您好,可能我运行结果不太准确,以及阅读代码不畅,无法判断这几个指标对应结果,想询问这里'similar, label, rand, neg' 和论文中指标的对应关系。 非常感谢

ryanshea10 commented 9 months ago

Hi, here's the relationship between the labels in the code vs the paper:

Hits@1: label Entail@1: similar Rand@1: rand Contradict@1: neg

Your results seem fairly similar to what we got for BB3. The results in that table are aggregated across the categories for "likes", "haves", and "attributes".

Fang-git0 commented 9 months ago

Hi, here's the relationship between the labels in the code vs the paper:

Hits@1: label Entail@1: similar Rand@1: rand Contradict@1: neg

Your results seem fairly similar to what we got for BB3. The results in that table are aggregated across the categories for "likes", "haves", and "attributes".

Thank you very much. In addition, I would like to ask if there is anything wrong with my 128% result because it exceeds 100%.

ryanshea10 commented 9 months ago

You shouldn't be getting 128%. Those numbers are counts so after you aggregate across the categories you need to divide the result by the total number of samples (which is 542) to get the percentage. Given the numbers you posted your result for Entail@1 is: (36+34+94)/542 = 30.3%.