Update course evals - Githubissues

JiaqiWang18 commented 11 months ago

Contact Details

No response

Is your feature request related to a problem? Please describe.

Many users have requested us to update the course evaluation on Semester.ly

Describe the solution you'd like.

Take evluations published from the official source, and add them to course modals in a format similar to: https://jhu.semester.ly/course/EN.660.332/Fall/2023

It does not seem like there is an API for this data. So a solution is to create a web crawler and ingest these data manually.

Describe alternatives you've considered

Have users submit course evals thru semester.ly. Not recommended if we are not offering different questions from these on the official forms.

Additional Information

No response

Code of Conduct

[X] I agree to follow Semester.ly's Code of Conduct

jchen324 commented 11 months ago

Some research into course evaluation:

How evaluations before 2015 were injected into database:
- Htmls containing all courses and their evaluations were downloaded and stored to parsing/schools/jhu/HopkinsEvaluations
- Parser parsing/schools/jhu/evals.py was run to parse the html and inject evaluations into database
However, since htmls containing all courses and evaluations are not available any more, this method doesn't work now
- Current evaluation website mandates the use of cookies and sessions, requiring JHED login
- Parsing is made difficult because the website only returns a portion of the evaluation results in html with no clear class id or name, and users have to click on "show more results" to send xhr to get additional results

We should discuss further on how to proceed with implementing this feature.

JiaqiWang18 commented 10 months ago

Current Evaluation Ingestion Steps

First run the ingest command to convert HTML to json Then run the digets command to save json to database

How to run current ingestor

python manage.py ingest jhu --types evals

Note --years and --terms flags don't work It generates parsing/schools/jhu/data/evals.json

How to run current digestor

The digestir loads the json data and save them into the database. It has multiple digestion strategies to ensure data consistency

python manage.py ingest jhu --types evals --years 2015

`Evals.json` format

A list of below object

 {
    "course": {
      "code": "AS.310.305"
    },
    "instructors": [
      {
        "name": "Marvin Ott"
      }
    ],
    "kind": "eval",
    "score": 4.32,
    "summary": "Students praised the course...",
    "term": "Fall",
    "year": "2013"
  }

Next steps

Currently, evals.py the ingestor uses Beautifulsoup to parse HTMLs. Since we now need to use Selenium and require authentication, it is probably better if we run this part locally for security reasons and generate a json file to be read and give this file directly to the digestor.

For the digestion step, we should aim to use the existing code in digestor.py because it has robust logic to validate, reconcile difference, and avoid duplicated records already. So we should try to generate a json file that has the same format as the one above so we can directly give it to digestor.

jhuopensource / semesterly

Update course evals #1038

Contact Details

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered

Additional Information

Code of Conduct

Current Evaluation Ingestion Steps

How to run current ingestor

How to run current digestor

`Evals.json` format

Next steps

jhuopensource / semesterly

Update course evals #1038

Contact Details

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered

Additional Information

Code of Conduct

Current Evaluation Ingestion Steps

How to run current ingestor

How to run current digestor

Evals.json format

Next steps

`Evals.json` format