marvelai-org / marvel-ai-backend

This is the Marvel Teaching Assistant ai repo.
MIT License
17 stars 85 forks source link

Squad Robo Geniuses, Worksheet feature #68

Open Dim314159 opened 4 months ago

Dim314159 commented 4 months ago

Description and formatting

We created worksheet generator located features/worksheet

We used tool ID "3". from api/tools_config.json:

    "3": {
        "path": "features.worksheet.core",
        "metadata_file": "metadata.json"
    }

Here is the configuration of our input in the features/worksheet/metadata.json:

{
    "inputs": [
        {
            "label": "Topic or text",
            "name": "topic",
            "type": "text"
        },
        {
            "label": "Grade level",
            "name": "grade_level",
            "type": "text"
        },
        {
            "label": "Number of Worksheets",
            "name": "num_worksheets",
            "type": "number"
        },
        {
            "label": "Number of Fill in the Blank Questions",
            "name": "num_fill_in_blank",
            "type": "number"
        },
        {
            "label": "Number of Multiple Choice Questions",
            "name": "num_multiple_choice",
            "type": "number"
        },
        {
            "label": "Number of Open-Ended Questions",
            "name": "num_open_ended",
            "type": "number"
        },
        {
            "label": "Number of True or False Questions",
            "name": "num_true_false",
            "type": "number"
        }
    ]
}

So our request body may look like:

{
    "user": {
        "id": "string",
        "fullName": "string",
        "email": "string"
    },
    "type": "tool",
    "tool_data": {
        "tool_id": 3,
        "inputs": [
            {
                "name": "topic",
                "value": "Visual Arts"
            },
            {
                "name": "grade_level",
                "value": "Undergraduate University (sophomore)"
            },
            {
                "name": "num_worksheets",
                "value": 3
            },
            {
                "name": "num_fill_in_blank",
                "value": 2
            },
            {
                "name": "num_multiple_choice",
                "value": 3
            },
            {
                "name": "num_open_ended",
                "value": 1
            },
            {
                "name": "num_true_false",
                "value": 4
            }
        ]
    }
}

'num_worksheets' variable indicate how many worksheets will be created. The other variables indicate how many fill-in-the-blank, multiple-choice, open-ended and true or false questions will be created in each worksheet.

The result will be the list of dictionaries, where each dictionary represents one worksheet. Structure of the worksheet dictionary:

Code structure

in features/worksheet/tools.py we created: class WorksheetBuilder with method:

For generating questions we created classes: class QuestionBase with methods:

For each section of the worksheet ('description', 'fill_in_blank', 'multiple_choice', 'open_ended', 'true_false') we created subclass: class Summary(QuestionBase) class FillInTheBlankQuestion(QuestionBase) class MultipleChoiceQuestion(QuestionBase) class OpenEndedQuestion(QuestionBase) class TrueFalseQuestion(QuestionBase) Each subclass has its own 'validate_response' method, since formats of different questions are different.

For the response formatting we used format instructions.

prompt = PromptTemplate(
            template=self.prompt_template,
            input_variables=["topic", "grade_level"],
            partial_variables={"format_instructions": self.parser.get_format_instructions()}
        )

Each section has its own parser. Some parsers are identical:

class TrueFalseQuestionFormat(BaseModel):
    question: str = Field(description = "The question text")
    answer: str = Field(description = "The correct answer")
class OpenEndedQuestionFormat(BaseModel):
    question: str = Field(description = "The question text")
    answer: str = Field(description = "The correct answer")

But we decided to keep them in case if in the future we have to adjust them separately, due to possible future formatting demands (or whatever changes can be).

Testing

We tested worksheet generator for different topics and different grade levels.

Topics: Math, Computer Science, Biology, Philosophy, Chemistry, Literature, Visual Arts, History. Grade levels: Middle School, High School, Undergraduate University (one of: freshman, sophomore, junior, senior), Graduate School (one of: masters, doctoral), Postdoctoral.

Generator worked well with no significant errors.

Possible improvements.

for the questions generating we used "model": VertexAI(model="gemini-1.0-pro", temperature = 0.4) Sometimes it produces inaccurate questions. For example: {'question': 'In Shakespeare's play "Romeo and Juliet," Juliet is 13 years old.', 'answer': 'False'} The correct answer is True. This can be improved by using better model. Or even more complicated way: correctness check through internet search.