mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper
MIT License
286 stars 52 forks source link

A question about ppl score #29

Closed FayeXXX closed 1 year ago

FayeXXX commented 1 year ago

Thank u for ur excellent work. When I run ur code on my dataset and evaluate ppl score, do I need to train another model for ppl? Line34 in examples/text-style-transfer/evalustion/run_eval.py, does that imply we need to load a finetuned gpt-2? And could u plz give me some hints to get that?

Ur quick reply will be really appreciated, and thanks agian.

MM-IR commented 1 year ago

Hi, as far as I know, for general text generation task, that is not quite necessary for you to further fine-tune a LM to evaluate the generation fluency. However, in our text style transfer task, we have the parallel text corpora, that is, we can directly fine-tune a LM to evaluate its PPL, which shall reflect the generation fluency with respect to the corpus text distribution. So you can think carefully about your task scenario, if that is necessary, you can follow our experimental setups to build your tuned LM to evaluate the PPL score.

Since it is a clarification, I am closing this now.

MM-IR commented 1 year ago

And PPL is also one of the fluency measures, which has its own limitations shown in existing work. Herein, I won't say too many details. If you are interested, you can check some TST work we cited in our paper.

FayeXXX commented 1 year ago

Thank you so much, there are actually many limitation for ppl. As for TST work, most of it's dataset is unparallel, is that means we can load hugging face language models directly to measure the ppl and don't need to finetune on TST dataset?

FayeXXX commented 1 year ago

I'm wondering is it necessary to measure the fluency? Do you think it is reasonable to use gammar check instead of fluency measures? Thank you for your patience and your instructive and kind-hearted reply.

MM-IR commented 1 year ago

Sorry for the late reply... I have not noticed your further questions.

  1. PPL LM: Yes.
  2. A lot of work prefers to using grammar check (classifier tuned) as the fluency measurements. Please check our cited work for more details.
MM-IR commented 1 year ago

Let's not view ppl == fluency. PPL is one of the fluency measurements. Sorry for my imprecise language uses before.

FayeXXX commented 11 months ago

Thank you for your detailed reply , it helps me a lot.