ramsrigouthamg / Questgen.ai

Question generation using state-of-the-art Natural Language Processing algorithms
https://questgen.ai/
MIT License
910 stars 294 forks source link

Which datasets are used? #7

Closed thomas-chauvet closed 4 years ago

thomas-chauvet commented 4 years ago

Hello,

This project is really interesting!

Could you share which datasets are used for training? I didn't find it in the code.

Thanks in advance!

ramsrigouthamg commented 4 years ago

Hi @thomas-chauvet ! Thanks for checking. You can find few pointers here - https://towardsdatascience.com/generating-boolean-yes-no-questions-from-any-content-using-t5-text-to-text-transformer-model-69f2744aff44 https://towardsdatascience.com/paraphrase-any-question-with-t5-text-to-text-transfer-transformer-pretrained-model-and-cbb9e35f1555

Mainly Quora Question pairs, BoolQ, Squad and MSMarco are the datasets used

thomas-chauvet commented 4 years ago

Thank you for your answer!