patil-suraj / question_generation

Neural question generation using transformers
MIT License
1.09k stars 346 forks source link

Evaluating the QG models #75

Open KristiyanVachev opened 3 years ago

KristiyanVachev commented 3 years ago

Hi! I've started working on my own QG algorithm for my Masters thesis and I'm trying to learn how to evaluate a model.

Since you've posted your metrics, I've been trying to replicate them, but I am having some troubles. Can you tell me what was your approach?

I am using nlg-eval as well and looking at your references.txt I can conclude that you are making a hypothesis for each question in the SQuAD dev set.

I ran your algorithm (t5-base) on the first 1000 references by

  1. Passing only the sentence the answer of the original SQuAD question is located
  2. Taking the first generated question from your model (since you are generating multiple suitable questions)

The results I got from nlg-eval were:

Bleu_1: 0.295304
Bleu_2: 0.202987
Bleu_3: 0.144872
Bleu_4: 0.107891
METEOR: 0.178506
ROUGE_L: 0.290285
CIDEr: 0.955470

Do you pass only the sentence the SQUAD question is located in or perhaps the entire paragraph? And which question do you take as your hypothesis out of all the generated ones?

yqinolive commented 3 years ago

Looking forward to the answer ...