'num_worksheets' variable indicate how many worksheets will be created.
The other variables indicate how many fill-in-the-blank, multiple-choice, open-ended and true or false questions will be created in each worksheet.
The result will be the list of dictionaries, where each dictionary represents one worksheet.
Structure of the worksheet dictionary:
'description': string with description of the topic.
'fill_in_blank': list of dictionaries, where each dictionary represents one fill-in-blank question.
each fill-in-blank question dictionary has a structure:
'question': question string with removed key word or concept and substituted by "___".
'answer': the correct answer (key word or concept removed).
'multiple_choice': list of dictionaries, where each dictionary represents one multiple-choice question.
each multiple-choice question dictionary has a structure:
'question' : question string.
'choices': list of key-value pairs where key is letter and value possible answer.
each choice dictionary has a structure:
'key': letter (A, B, C, D).
'value': possible answer.
'answer': the correct choice (A, B, C, D)
'open_ended': list of dictionaries, where each dictionary represents one open-ended question.
each open-ended question dictionary has a structure:
'question': question string.
'answer': answer string.
'true_false': list of dictionaries, where each dictionary represents one true or false question.
each true or false question dictionary has a structure:
'question': question string.
'answer': answer string (one word: 'True' or 'False')
Code structure
in features/worksheet/tools.py we created:
class WorksheetBuilder with method:
'create_worksheets', this method generates desired number of worksheets. We used limit:
if num_worksheets > 10:
return {"message": "error", "data": "Number of Worksheets cannot exceed 10"}
this limit is just for testing purposes and can be eliminated (or changed to different number) for the final version.
For generating questions we created classes:
class QuestionBase with methods:
'create_questions' tries to generate indicated amount of questions.
'validate_response' makes sure the generated question obeys format requirements.
'not_unique' makes sure each new question (within one section of the worksheet) is unique. We used embedding model to calculate similarities between new and already created questions.
For each section of the worksheet ('description', 'fill_in_blank', 'multiple_choice', 'open_ended', 'true_false') we created subclass:
class Summary(QuestionBase)
class FillInTheBlankQuestion(QuestionBase)
class MultipleChoiceQuestion(QuestionBase)
class OpenEndedQuestion(QuestionBase)
class TrueFalseQuestion(QuestionBase)
Each subclass has its own 'validate_response' method, since formats of different questions are different.
For the response formatting we used format instructions.
Each section has its own parser. Some parsers are identical:
class TrueFalseQuestionFormat(BaseModel):
question: str = Field(description = "The question text")
answer: str = Field(description = "The correct answer")
class OpenEndedQuestionFormat(BaseModel):
question: str = Field(description = "The question text")
answer: str = Field(description = "The correct answer")
But we decided to keep them in case if in the future we have to adjust them separately, due to possible future formatting demands (or whatever changes can be).
Testing
We tested worksheet generator for different topics and different grade levels.
Topics: Math, Computer Science, Biology, Philosophy, Chemistry, Literature, Visual Arts, History.
Grade levels: Middle School, High School, Undergraduate University (one of: freshman, sophomore, junior, senior), Graduate School (one of: masters, doctoral), Postdoctoral.
Generator worked well with no significant errors.
Possible improvements.
for the questions generating we used
"model": VertexAI(model="gemini-1.0-pro", temperature = 0.4)
Sometimes it produces inaccurate questions. For example:
{'question': 'In Shakespeare's play "Romeo and Juliet," Juliet is 13 years old.', 'answer': 'False'}
The correct answer is True. This can be improved by using better model. Or even more complicated way: correctness check through internet search.
Description and formatting
We created worksheet generator located features/worksheet
We used tool ID "3". from api/tools_config.json:
Here is the configuration of our input in the features/worksheet/metadata.json:
So our request body may look like:
'num_worksheets' variable indicate how many worksheets will be created. The other variables indicate how many fill-in-the-blank, multiple-choice, open-ended and true or false questions will be created in each worksheet.
The result will be the list of dictionaries, where each dictionary represents one worksheet. Structure of the worksheet dictionary:
Code structure
in features/worksheet/tools.py we created: class WorksheetBuilder with method:
this limit is just for testing purposes and can be eliminated (or changed to different number) for the final version.
For generating questions we created classes: class QuestionBase with methods:
For each section of the worksheet ('description', 'fill_in_blank', 'multiple_choice', 'open_ended', 'true_false') we created subclass: class Summary(QuestionBase) class FillInTheBlankQuestion(QuestionBase) class MultipleChoiceQuestion(QuestionBase) class OpenEndedQuestion(QuestionBase) class TrueFalseQuestion(QuestionBase) Each subclass has its own 'validate_response' method, since formats of different questions are different.
For the response formatting we used format instructions.
Each section has its own parser. Some parsers are identical:
But we decided to keep them in case if in the future we have to adjust them separately, due to possible future formatting demands (or whatever changes can be).
Testing
We tested worksheet generator for different topics and different grade levels.
Topics: Math, Computer Science, Biology, Philosophy, Chemistry, Literature, Visual Arts, History. Grade levels: Middle School, High School, Undergraduate University (one of: freshman, sophomore, junior, senior), Graduate School (one of: masters, doctoral), Postdoctoral.
Generator worked well with no significant errors.
Possible improvements.
for the questions generating we used "model": VertexAI(model="gemini-1.0-pro", temperature = 0.4) Sometimes it produces inaccurate questions. For example: {'question': 'In Shakespeare's play "Romeo and Juliet," Juliet is 13 years old.', 'answer': 'False'} The correct answer is True. This can be improved by using better model. Or even more complicated way: correctness check through internet search.