thinc-org / cugetreg

A course registration planning application for CU students
https://cugetreg.com
77 stars 2 forks source link

[BACKLOG] Automate GenEd courses classification #402

Open bombnp opened 2 years ago

bombnp commented 2 years ago

Problem

Currently, courses are tagged as GenEd using override objects in MongoDB which maps (courseNo, studyProgram, semester, academicYear) to (genEdType, sections). However, these override objects are manually created by our maintainers and uploaded to MongoDB.

The current process of GenEd classification is

  1. Go to https://cas.reg.chula.ac.th/cu/cs/QueryCourseScheduleNew/index.html
  2. Inspect element and remove type="HIDDEN" attribute from genedcode field to enable it image
  3. Query a list of GenEd courses using genedcodes (1 = SO, 2 = HU, 3 = SC, 4 = IN)
  4. Go to the courses one-by-one and manually evaluate if this course is REALLY GenEd (the ones most students must enroll).
  5. Create a csv of all manually-verified GenEd courses, and upload them to MongoDB (via cugetreg-api) as override objects

Obviously, this process is pain-staking and prone to human errors.

Task Description

Develop a way to automatically tag courses as GenEd (or at least, make it least manually as possible), to make it easier to maintain for future generations. One current idea is to infer from the section's notes to determine if it's either:

  1. definitely GenEd
  2. is definitely NOT GenEd
  3. not sure, needs human verification

The solution remains to be discussed further.

Additional Context

Currently, we have course data from https://cas.reg.chula.ac.th. We could try to obtain data from other sources. Consult @bombnp if you want to request access to data we don't currently have.

Related Teams

Task Advisors

@bombnp

bombnp commented 2 years ago

After doing basic data exploratory analysis with @panus2001, we decided it's better to consult professors on how to accurately determine GenEd courses, since there must be a way to validate this when we graduate. Will ask Proadpran to start.