Generate exact number of questions

patil-suraj / question_generation

Neural question generation using transformers

MIT License

1.11k stars 348 forks source link

Generate exact number of questions #27

Open krrishdholakia opened 4 years ago

krrishdholakia commented 4 years ago

Hi,

Great work on the library 🎉, it's super useful.

Is it possible to generate a specific number of questions ? I know we have 'num_return_sequences' but i've seen that despite specifying specifying a high number of return sequences:

`model_args = { "max_length": 256, "num_beams": 12, "length_penalty": 1.5, "no_repeat_ngram_size": 3, "num_return_sequences": 10, "early_stopping": True, }

nlp(text5, model_args)`

i still get fewer than expected questions:

['The speed of light is slower in a medium other than what?', 'What is responsible for phenomena such as refraction?', 'The idea of light scattering from, or being absorbed and re-emitted by atoms, is both incorrect and what is not seen in nature?']

psinha30 commented 4 years ago

Can you provide the context? because it gives me exactly the same number of questions I ask for.

krrishdholakia commented 4 years ago

sure -

i'm using the t5 model from the nlp pipeline e2e

and i run the nlp argument given above: nlp(text5, model_args)

any additional context i can give ?

patil-suraj commented 4 years ago

thanks @krrishdholakia !

Here num_return_sequences can't be used because the number questions generated will depend on the number of answers extracted. If the ans extraction model gives only two answer then only two questions will be generated.

num_return_sequences is used with beam search or top-k top-p sampling in the .generate method. With beam search, in most of the cases it returns the similar or slightly paraphrased version of the same questions, so I'm not using num_return_sequences.

I'm trying out other methods for better answer extraction, but havn't got any good results yet. Will ping you if I find some other method to extract more answers.

krrishdholakia commented 4 years ago

hey @patil-suraj,

they mention an interesting approach using top-k sampling in this article - https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313

thoughts on using this ?

patil-suraj commented 4 years ago

I have tried sampling but beam search results are better than sampling for this task. Feel free to give it try though!

nomoreoneday commented 3 years ago

thanks @krrishdholakia !

Here num_return_sequences can't be used because the number questions generated will depend on the number of answers extracted. If the ans extraction model gives only two answer then only two questions will be generated.

num_return_sequences is used with beam search or top-k top-p sampling in the .generate method. With beam search, in most of the cases it returns the similar or slightly paraphrased version of the same questions, so I'm not using num_return_sequences.

I'm trying out other methods for better answer extraction, but havn't got any good results yet. Will ping you if I find some other method to extract more answers.

Hi @patil-suraj, great work! May I ask how do you control the number of answers extracted? My project came up with a different number of answers for different texts.