jhuopensource / semesterly

Your Semester Made Easier | Course Scheduling & Social Tools for Students
https://semester.ly
GNU General Public License v3.0
33 stars 56 forks source link

Update course evals #1038

Closed JiaqiWang18 closed 8 months ago

JiaqiWang18 commented 11 months ago

Contact Details

No response

Is your feature request related to a problem? Please describe.

Many users have requested us to update the course evaluation on Semester.ly

Describe the solution you'd like.

Take evluations published from the official source, and add them to course modals in a format similar to: https://jhu.semester.ly/course/EN.660.332/Fall/2023

It does not seem like there is an API for this data. So a solution is to create a web crawler and ingest these data manually.

Describe alternatives you've considered

Have users submit course evals thru semester.ly. Not recommended if we are not offering different questions from these on the official forms.

Additional Information

No response

Code of Conduct

jchen324 commented 11 months ago

Some research into course evaluation:

We should discuss further on how to proceed with implementing this feature.

JiaqiWang18 commented 10 months ago

Current Evaluation Ingestion Steps

First run the ingest command to convert HTML to json Then run the digets command to save json to database

How to run current ingestor

python manage.py ingest jhu --types evals

Note --years and --terms flags don't work It generates parsing/schools/jhu/data/evals.json

How to run current digestor

The digestir loads the json data and save them into the database. It has multiple digestion strategies to ensure data consistency

Evals.json format

A list of below object

 {
    "course": {
      "code": "AS.310.305"
    },
    "instructors": [
      {
        "name": "Marvin Ott"
      }
    ],
    "kind": "eval",
    "score": 4.32,
    "summary": "Students praised the course...",
    "term": "Fall",
    "year": "2013"
  }

Next steps

Currently, evals.py the ingestor uses Beautifulsoup to parse HTMLs. Since we now need to use Selenium and require authentication, it is probably better if we run this part locally for security reasons and generate a json file to be read and give this file directly to the digestor.

For the digestion step, we should aim to use the existing code in digestor.py because it has robust logic to validate, reconcile difference, and avoid duplicated records already. So we should try to generate a json file that has the same format as the one above so we can directly give it to digestor.