uWaterloo / OpenData

Help and Support for University of Waterloo Open Data Initiative
https://api.uwaterloo.ca
90 stars 12 forks source link

Course prerequisite issue #193

Closed joe-edwin closed 5 years ago

joe-edwin commented 6 years ago

Currently, some course prerequisite have errors on the field. For example, STAT 230 does not list MATH 137 as one of the prerequisite. Also, it would be nice if the field could be made more consistent. For example, MATH 136 has these as the prerequisites: <prerequisites_parsed> <item>1</item> <item>MATH135</item> <item>MATH145</item> </prerequisites_parsed> And CS 245 has these: <prerequisites_parsed> <item> <item>1</item> <item>CS136</item> <item>CS138</item> <item>CS146</item> </item> <item>MATH135</item> </prerequisites_parsed> As can be seen, it is a little inconsistent. It would be nice if the prerequisite for MATH 136 is wrapped around one tag, so it would look like this: <prerequisites_parsed> <item> <item>1</item> <item>MATH135</item> <item>MATH145</item> </item> </prerequisites_parsed> Which makes it more consistent.

Thanks.

steverweber commented 5 years ago

Many of the api prerequisites_parsed have issues!

In the current state prerequisites_parsed should include a LARGE disclaimer or be removed all together.

We don't want to give students bad data when selecting courses!

I work for MFCF and will try to drum up interest to fix what i assume the root issue is.

Using a large free-form text field is troublesome database design

examples of bad data...

MATH136
- Prereq: (MATH 135 with a grade of at least 60% or MATH 145; Honours Mathematics or Mathematics/ELAS students) or Science Mathematical Physics students.
# *** WRONG
- prerequisites_parsed': [[1, 'MATH135', 'MATH145']]

MATH145
- Prereq: 4U Calculus and Vectors or 4U Mathematics of Data Management; Honours Mathematics students only.
# *** WRONG
- prerequisites_parsed: ''

BIOL377
- Prereq: BIOL 273 or PSYCH 261; at least one of MATH 127, 137, PHYS 111, 115, 121
# *** WRONG
- prerequisites_parsed: [[1, 'BIOL273', 'PSYCH261'], 'MATH127', 'MATH137', 'PHYS111', 'PHYS115', 'PHYS121']
- expected: [[1, 'BIOL273', 'PSYCH261'], [1,'MATH127', 'MATH137', 'PHYS111', 'PHYS115', 'PHYS121']]

PHYS175
- Prereq: One of PHYS 111,115,121; one of MATH 116, 117, 127, 137,147
# OK
- prerequisites_parsed: [[1, 'PHYS111', 'PHYS115', 'PHYS121'], [1, 'MATH116', 'MATH117', 'MATH127', 'MATH137', 'MATH147']]
sbobkin commented 5 years ago

Hi,

The way this is done in the legacy version (currently running) of the Open Data API is to apply rules to parse a string field out of Quest that contains the descriptions. Clearly the rules are insufficient, and some data is not accurate. Because of how broad the descriptions are, no amount of attempting to automate the parsing will lead to a correct result in all cases.

It's a long term to do to see if using the raw requirement group rules that check some of these conditions on enrollment will lead to a better result, but this is not a "soon" feature.

As it stands we'll be deprecating this endpoint for other reasons (including the page it scrapes data from going away), and replacing the content with data we can be sure of and is verified to be accurate. It will not include parsed requisites to begin with.

We'd be more than happy to integrate this data from any other campus projects that may go through the process of modeling this work.

steverweber commented 5 years ago

As it stands we'll be deprecating this endpoint for other reasons (including the page it scrapes data from going away

What source you are using?

We'd be more than happy to integrate this data from any other campus projects that may go through the process of modeling this work.

@sbobkin I'll try to keep your group updated on the progress.

sbobkin commented 5 years ago

What source you are using?

@steverweber it's a mix of scraping the course catalog pages, and the schedule of classes at http://www.adm.uwaterloo.ca/infocour/CIR/SA/under.html

steverweber commented 5 years ago

thanks. fyi s8weber@uwaterloo.ca if you want to take some discussion offline.