Closed FayeXXX closed 1 year ago
Hi, as far as I know, for general text generation task, that is not quite necessary for you to further fine-tune a LM to evaluate the generation fluency. However, in our text style transfer task, we have the parallel text corpora, that is, we can directly fine-tune a LM to evaluate its PPL, which shall reflect the generation fluency with respect to the corpus text distribution. So you can think carefully about your task scenario, if that is necessary, you can follow our experimental setups to build your tuned LM to evaluate the PPL score.
Since it is a clarification, I am closing this now.
And PPL is also one of the fluency measures, which has its own limitations shown in existing work. Herein, I won't say too many details. If you are interested, you can check some TST work we cited in our paper.
Thank you so much, there are actually many limitation for ppl. As for TST work, most of it's dataset is unparallel, is that means we can load hugging face language models directly to measure the ppl and don't need to finetune on TST dataset?
I'm wondering is it necessary to measure the fluency? Do you think it is reasonable to use gammar check instead of fluency measures? Thank you for your patience and your instructive and kind-hearted reply.
Sorry for the late reply... I have not noticed your further questions.
Let's not view ppl == fluency. PPL is one of the fluency measurements. Sorry for my imprecise language uses before.
Thank you for your detailed reply , it helps me a lot.
Thank u for ur excellent work. When I run ur code on my dataset and evaluate ppl score, do I need to train another model for ppl? Line34 in examples/text-style-transfer/evalustion/run_eval.py, does that imply we need to load a finetuned gpt-2? And could u plz give me some hints to get that?
Ur quick reply will be really appreciated, and thanks agian.