Open KristiyanVachev opened 3 years ago
Hi! I've started working on my own QG algorithm for my Masters thesis and I'm trying to learn how to evaluate a model.
Since you've posted your metrics, I've been trying to replicate them, but I am having some troubles. Can you tell me what was your approach?
I am using nlg-eval as well and looking at your references.txt I can conclude that you are making a hypothesis for each question in the SQuAD dev set.
I ran your algorithm (t5-base) on the first 1000 references by
The results I got from nlg-eval were:
Bleu_1: 0.295304 Bleu_2: 0.202987 Bleu_3: 0.144872 Bleu_4: 0.107891 METEOR: 0.178506 ROUGE_L: 0.290285 CIDEr: 0.955470
Do you pass only the sentence the SQUAD question is located in or perhaps the entire paragraph? And which question do you take as your hypothesis out of all the generated ones?
Looking forward to the answer ...
Hi! I've started working on my own QG algorithm for my Masters thesis and I'm trying to learn how to evaluate a model.
Since you've posted your metrics, I've been trying to replicate them, but I am having some troubles. Can you tell me what was your approach?
I am using nlg-eval as well and looking at your references.txt I can conclude that you are making a hypothesis for each question in the SQuAD dev set.
I ran your algorithm (t5-base) on the first 1000 references by
The results I got from nlg-eval were:
Do you pass only the sentence the SQUAD question is located in or perhaps the entire paragraph? And which question do you take as your hypothesis out of all the generated ones?