indrajithi / genquest

Automatic question generation by using NLP
208 stars 60 forks source link

NLP: Automatic Question Generation

This program takes a text file as an input and generates questions by analyzing each sentence.

Note: A similar implementatin is here.

Usage

Virtualenv recommended

pip install -r requirements.txt

python -m textblob.download_corpora python3 quest.py file.txt

Use -v option to activate verbose

python3 quest.py file.txt -v

You can also try inputing any text file.

How does this work?

A text file passed as argument to the program.

The text file is read using a Python package called textblob. Each paragraph is further broken down into sentences using the function parse(string): And each sentence is passed as string to function genQuestion(line):

These are the part-of-speech tags which is used in this demo.

NNS     Noun, plural
JJ  Adjective 
NNP     Proper noun, singular 
VBG     Verb, gerund or present participle 
VBN     Verb, past participle 
VBZ     Verb, 3rd person singular present 
VBD     Verb, past tense 
IN      Preposition or subordinating conjunction 
PRP     Personal pronoun 
NN  Noun, singular or mass 

Ref: Alphabetical list of part-of-speech tags used in the Penn Treebank Project

This program uses a small list of combinations.

    l1 = ['NNP', 'VBG', 'VBZ', 'IN']
    l2 = ['NNP', 'VBG', 'VBZ']
    l3 = ['PRP', 'VBG', 'VBZ', 'IN']
    l4 = ['PRP', 'VBG', 'VBZ']
    l5 = ['PRP', 'VBG', 'VBD']
    l6 = ['NNP', 'VBG', 'VBD']
    l7 = ['NN', 'VBG', 'VBZ']
    l8 = ['NNP', 'VBZ', 'JJ']
    l9 = ['NNP', 'VBZ', 'NN']
    l10 = ['NNP', 'VBZ']
    l11 = ['PRP', 'VBZ']
    l12 = ['NNP', 'NN', 'IN']
    l13 = ['NN', 'VBZ']

Each sentence is parsed using English grammar rules with the use of condition statements. A dictionary is created called bucket and the part-of-speech tags are added to it.

The sentence which gets parsed successfully generates a question sentence. The generated question list is printed as output.

This demo only uses the grammar to generate questions starting with 'what'.

Example

Sentence:

-----------INPUT TEXT-------------

Bansoori is an Indian classical instrument. Akhil plays Bansoori and Guitar. 
Puliyogare is a South Indian dish made of rice and tamarind. 
Priya writes poems. 

Osmosis is the movement of a solvent across a semipermeable membrane toward a higher concentration of solute. In biological systems, the solvent is typically water, but osmosis can occur in other liquids, supercritical liquids, and even gases.
When a cell is submerged in water, the water molecules pass through the cell membrane from an area of low solute concentration to high solute concentration. For example, if the cell is submerged in saltwater, water molecules move out of the cell. If a cell is submerged in freshwater, water molecules move into the cell.

Raja-Yoga is divided into eight steps, the first is Yama -- non - killing, truthfulness, non - stealing, continence, and non - receiving of any gifts.
Next is Niyama -- cleanliness, contentment, austerity, study, and self - surrender to God. 

-----------INPUT END---------------

Generated questions.

 Question: What is Bansoori?

 Question: What does Akhil play?

 Question: What is Puliyogare?

 Question: What does Priya write?

 Question: What is Osmosis?

 Question: What is solvent?

 Question: What is cell?

 Question: What is example?

 Question: What is cell?

 Question: What is Raja-Yoga?

 Question: What is Niyama?

We can also activate the verbose mode by -v argument to further understand the question generation process.

Output: with verbose option.

 Bansoori is an Indian classical instrument. 

TAGS: [('Bansoori', 'NNP'), ('is', 'VBZ'), ('an', 'DT'), ('Indian', 'JJ'), ('classical', 'JJ'), ('instrument', 'NN')] 

{'NN': 5, 'JJ': 3, 'VBZ': 1, 'DT': 2, 'NNP': 0}

 Question: What is Bansoori?

 --------------------
Akhil plays Bansoori and Guitar. 

TAGS: [('Akhil', 'NNP'), ('plays', 'VBZ'), ('Bansoori', 'NNP'), ('and', 'CC'), ('Guitar', 'NNP')] 

{'CC': 3, 'VBZ': 1, 'NNP': 0}

 Question: What does Akhil play?

 --------------------
Puliyogare is a South Indian dish made of rice and tamarind. 

TAGS: [('Puliyogare', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('South', 'JJ'), ('Indian', 'JJ'), ('dish', 'NN'), ('made', 'VBN'), ('of', 'IN'), ('rice', 'NN'), ('and', 'CC'), ('tamarind', 'NN')] 

{'JJ': 3, 'IN': 7, 'NNP': 0, 'DT': 2, 'NN': 5, 'CC': 9, 'VBZ': 1, 'VBN': 6}

 Question: What is Puliyogare?

 --------------------
Priya writes poems. 

TAGS: [('Priya', 'NNP'), ('writes', 'VBZ'), ('poems', 'NNS')] 

{'VBZ': 1, 'NNS': 2, 'NNP': 0}

 Question: What does Priya write?

 --------------------
Osmosis is the movement of a solvent across a semipermeable membrane toward a higher concentration of solute. 

TAGS: [('Osmosis', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('movement', 'NN'), ('of', 'IN'), ('a', 'DT'), ('solvent', 'JJ'), ('across', 'IN'), ('a', 'DT'), ('semipermeable', 'JJ'), ('membrane', 'NN'), ('toward', 'IN'), ('a', 'DT'), ('higher', 'JJR'), ('concentration', 'NN'), ('of', 'IN'), ('solute', 'NN')] 

{'JJ': 6, 'IN': 4, 'DT': 2, 'NN': 0, 'VBZ': 1, 'JJR': 13}

 Question: What is Osmosis?

How to improve this program.

Reference

Alphabetical list of part-of-speech tags used in the Penn Treebank Project

Automatic Factual Question Generation from Text

TextBlob: Simplified Text Processing

Automatic Question Generation from Paragraph

K2Q: Generating Natural Language Questions from Keywords with User Refinements

Infusing NLU into Automatic Question Generation

Literature Review of Automatic Question Generation Systems

Neural Question Generation from Text: A Preliminary Study

Learning to Ask: Neural Question Generation for Reading Comprehension [Apr 2017]

SQuAD: The Stanford Question Answering Dataset