thu-coai / UNION

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
57 stars 11 forks source link

Computing Perplexity #6

Open inimah opened 1 year ago

inimah commented 1 year ago

Hi, @JianGuanTHU Thanks for making the data publicly available.

Could you please elaborate more on how the current work computes "Perplexity" metric? Is it sentence-perplexity or perplexity of predicting a token?

The paper mentions in a footnote We take the minus of perplexity for all the following...

But I do not think the metric outputs in ~/Data/../metric_output/ppl.txt are reasonably fit with the text inputs. What is "minus of perplexity" in this context?

For example, score on sample-ID 151 from ant_data_all ((I am using HuggingFace -- evaluate perplexity metric))

Prediction text: ["we were looking for something fun to do on a female night . Female wife and i were so excited . we went to the mall . we had a great time . we had a great time ."]

results: {'perplexities': [55.47270202636719], 'mean_perplexity': 55.47270202636719}

While, in ppl.txt the score is 2.5693