Pull Request - Epic 7.7 - Syllabus Generator

By: Synthwave Sentinels

Summary

This PR allows educators to generate syllabi using different academic and strategic attributes. We based our research in this MIT Article about Syllabus Creation Moreover, educators can specify the course content with the extended file support implemented with the following file types for a better domain-specific adaptation:

PDF
CSV
TXT
MD
URL (Such as Wikipedia or other source with info.)
PPTX
DOCX
XLS / XLSX
XML
GOOGLE DOCS
GOOGLE SHEETS
GOOGLE SLIDES
GOOGLE PDF
IMAGES (PNG / JPG / JPEG)

Changes

Create a new tool called syllabus_generator with its appropriate metadata.json.
Reused the document loaders from Epic 7.2 #61.
Created a new tool.py for adding the new functionalities and Pydantic schemas for generating the syllabi.
Created a new core.py for extending the file support with the file_url and file_type attributes.
Reused the util file allowed_file_extensions_dynamo.py for a robust approach in the file type management from Epic 7.2 #61.
Created new Error Classes for improving the error mapping process.
Created new prompt templates for managing text, youtube videos and structured/tabular data in a personalized way.
Enabled a Computer Vision approach for generating syllabi from images as an innovative proposal for the project.

Testing

Each document loader was tested using pytest (Managing appropriate scenarios and edge cases).

Results

All document loaders are working appropriately with the optimal results.

Notes

For Google Docs, Slides, Sheets and PDF files:

Those files need to be shared as public in Google Drive.
They need to be uploaded to Google Drive. If a Google File is created in Google Drive, it needs to be downloaded and then uploaded to be detected. This was the most appropriate approach that we discovered as the GoogleDriver loader from LangChain works with OAuth2, which is not appropriate for production deployment.

How to Test

Clone the repo in your local
Create and activate virtual environment
Use pip install -r requirements.txt to install required libraries.
Create .env file with ENV_TYPE, GCP_PROJECT_ID and GOOGLE_API_KEY fields. Env type is dev, gcp_project_id is your project id from cloud console project, google_api_key is your api key from AI studio.
Then type ./local-start.sh to start the application.
Add sample requests and responses for each file type as mentioned above in screenshots and test it.

Request interface

Schema:

class SyllabusGeneratorArgsModel(BaseModel):
    grade_level: str
    course: str
    instructor_name: str
    instructor_title: str
    unit_time: str
    unit_time_value: int
    start_date: str
    start_date: str
    assessment_methods: str
    grading_scale: str
    file_url: str
    file_type: str
    lang: Optional[str] = "en"

Example:

{
   "user":{
      "id":"string",
      "fullName":"string",
      "email":"string"
   },
   "type":"chat",
   "tool_data":{
      "tool_id":6,
      "inputs":[
         {
            "name":"syllabus_generator_args",
            "value":{
               "grade_level":"College",
               "course":"Linear Regression",
               "instructor_name":"Amrutha Vinayakam",
               "instructor_title":"Master in Artificial Intelligence",
               "unit_time":"week",
               "unit_time_value":8,
               "start_date":"July, 4th, 2024",
               "assessment_methods":"project and exams",
               "grading_scale":"In percentages (100%)",
               "file_url":"https://firebasestorage.googleapis.com/v0/b/kai-ai-f63c8.appspot.com/o/uploads%2F510f946e-823f-42d7-b95d-d16925293946-Linear%20Regression%20Stat%20Yale.pdf?alt=media&token=caea86aa-c06b-4cde-9fd0-42962eb72ddd",
               "file_type":"pdf",
               "lang": "zh"
            }
         }
      ]
   }
}

Response interface

Schema:

class CourseInformation(BaseModel):
    course_title: str = Field(description="The course title")
    grade_level: str = Field(description="The grade level")
    description: str = Field(description="The course description")

class InstructorInformation(BaseModel):
    name: str = Field(description="The instructor name")
    title: str = Field(description="The instructor title")
    description_title: str = Field(description="The description of the instructor title")

class CourseDescriptionObjectives(BaseModel):
    objectives: List[str] = Field(description="The course objectives")
    intended_learning_outcomes: List[str] = Field(description="The intended learning outcomes of the course")

class CourseContentItem(BaseModel):
    unit_time: str = Field(description="The unit of time for the course content")
    unit_time_value: int = Field(description="The unit of time value for the course content")
    topic: str = Field(description="The topic per unit of time for the course content")

class PoliciesProcedures(BaseModel):
    attendance_policy: str = Field(description="The attendance policy of the class")
    late_submission_policy: str = Field(description="The late submission policy of the class")
    academic_honesty: str = Field(description="The academic honesty policy of the class")

class AssessmentMethod(BaseModel):
    type_assessment: str = Field(description="The type of assessment")
    weight: int = Field(description="The weight of the assessment in the final grade")

class AssessmentGradingCriteria(BaseModel):
    assessment_methods: List[AssessmentMethod] = Field(description="The assessment methods")
    grading_scale: dict = Field(description="The grading scale")

class LearningResource(BaseModel):
    title: str = Field(description="The book title of the learning resource")
    author: str = Field(description="The book author of the learning resource")
    year: int = Field(description="The year of creation of the book")

class CourseScheduleItem(BaseModel):
    unit_time: str = Field(description="The unit of time for the course schedule item")
    unit_time_value: int = Field(description="The unit of time value for the course schedule item")
    date: str = Field(description="The date for the course schedule item")
    topic: str = Field(description="The topic for the learning resource")
    activity_desc: str = Field(description="The descrition of the activity for the learning resource")

class SyllabusSchema(BaseModel):
    course_information: CourseInformation = Field(description="The course information")
    instructor_information: InstructorInformation = Field(description="The instructor information")
    course_description_objectives: CourseDescriptionObjectives = Field(description="The objectives of the course")
    course_content: List[CourseContentItem] = Field(description="The content of the course")
    policies_procedures: PoliciesProcedures = Field(description="The policies procedures of the course")
    assessment_grading_criteria: AssessmentGradingCriteria = Field(description="The asssessment grading criteria of the course")
    learning_resources: List[LearningResource] = Field(description="The learning resources of the course")
    course_schedule: List[CourseScheduleItem] = Field(description="The course schedule")

Example:

{
  "data": {
    "course_information": {
      "course_title": "线性回归",
      "grade_level": "大学",
      "description": "本课程涵盖线性回归的基本概念和应用，包括简单线性回归、多元线性回归、模型评估和诊断等。"
    },
    "instructor_information": {
      "name": "Amrutha Vinayakam",
      "title": "人工智能硕士",
      "description_title": "在机器学习和数据分析领域拥有丰富经验的专业人士。"
    },
    "course_description_objectives": {
      "objectives": [
        "理解线性回归的基本原理和假设。",
        "学习如何使用最小二乘法拟合线性回归模型。",
        "掌握评估线性回归模型拟合优度的指标。",
        "了解如何识别和处理线性回归中的异常值和影响点。",
        "能够使用线性回归模型进行预测和推断。"
      ],
      "intended_learning_outcomes": [
        "学生将能够解释线性回归的概念及其在现实世界中的应用。",
        "学生将能够使用统计软件包（如R或Python）执行线性回归分析。",
        "学生将能够解释线性回归模型的结果并得出有意义的结论。",
        "学生将能够批判性地评估线性回归模型的适用性和局限性。"
      ]
    },
    "course_content": [
      {
        "unit_time": "周",
        "unit_time_value": 1,
        "topic": "线性回归简介"
      },
      {
        "unit_time": "周",
        "unit_time_value": 2,
        "topic": "简单线性回归"
      },
      {
        "unit_time": "周",
        "unit_time_value": 3,
        "topic": "多元线性回归"
      },
      {
        "unit_time": "周",
        "unit_time_value": 4,
        "topic": "模型评估与诊断"
      },
      {
        "unit_time": "周",
        "unit_time_value": 5,
        "topic": "异常值和影响点"
      },
      {
        "unit_time": "周",
        "unit_time_value": 6,
        "topic": "预测和推断"
      },
      {
        "unit_time": "周",
        "unit_time_value": 7,
        "topic": "线性回归的应用"
      },
      {
        "unit_time": "周",
        "unit_time_value": 8,
        "topic": "复习和总结"
      }
    ],
    "policies_procedures": {
      "attendance_policy": "鼓励学生按时上课，积极参与课堂讨论。",
      "late_submission_policy": "迟交作业将被扣分，具体扣分细则将在课程网站上公布。",
      "academic_honesty": "学生应遵守学术诚信原则，严禁任何形式的抄袭和作弊行为。"
    },
    "assessment_grading_criteria": {
      "assessment_methods": [
        {
          "type_assessment": "项目",
          "weight": 50
        },
        {
          "type_assessment": "考试",
          "weight": 50
        }
      ],
      "grading_scale": {
        "A": "90-100%",
        "B": "80-89%",
        "C": "70-79%",
        "D": "60-69%",
        "F": "低于60%"
      }
    },
    "learning_resources": [
      {
        "title": "线性回归分析",
        "author": "道格拉斯·C·蒙哥马利、伊丽莎白·A·佩克、G·杰弗里·维宁",
        "year": 2012
      },
      {
        "title": "R语言实战",
        "author": "罗伯特·科布洛夫",
        "year": 2015
      }
    ],
    "course_schedule": [
      {
        "unit_time": "周",
        "unit_time_value": 1,
        "date": "2024-07-04",
        "topic": "线性回归简介",
        "activity_desc": "介绍线性回归的基本概念、应用和课程安排。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 2,
        "date": "2024-07-11",
        "topic": "简单线性回归",
        "activity_desc": "讲解简单线性回归模型、最小二乘法和模型评估指标。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 3,
        "date": "2024-07-18",
        "topic": "多元线性回归",
        "activity_desc": "介绍多元线性回归模型、变量选择方法和模型解释。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 4,
        "date": "2024-07-25",
        "topic": "模型评估与诊断",
        "activity_desc": "讲解模型诊断方法、残差分析和影响点分析。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 5,
        "date": "2024-08-01",
        "topic": "异常值和影响点",
        "activity_desc": "讨论异常值和影响点的识别和处理方法。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 6,
        "date": "2024-08-08",
        "topic": "预测和推断",
        "activity_desc": "讲解如何使用线性回归模型进行预测和推断。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 7,
        "date": "2024-08-15",
        "topic": "线性回归的应用",
        "activity_desc": "介绍线性回归在不同领域的应用案例。"
      },
      {
        "unit_time": "周",
        "unit_time_value": 8,
        "date": "2024-08-22",
        "topic": "复习和总结",
        "activity_desc": "回顾课程内容，解答学生疑问，进行期末考试准备。"
      }
    ]
  }
}

References

Request Templates Docs Responses

radicalxdev / marvel-ai-backend