kmehan / gt-course-critique

Georgia Institute of Technology Course Critique
18 stars 13 forks source link

Incorrect Course GPA #37

Open jeffchenoweth opened 9 years ago

jeffchenoweth commented 9 years ago

When a Professor has multiple sections of the same course in the same term, the Critique calculates a simple average of the sections to determine the overall distribution.

For instance: http://critique.gatech.edu/course.php?id=CS6035 Indicated Prof. Ahamad's sections in Fall 2013 had a 31% Withdrawal Rate. In actuality there are two sections of the class (http://critique.gatech.edu/prof.php?id=AHAMADMUSTAQUE#CS6035) One section has a 50% withdrawal rate and only 2 students and one section had a 12% withdrawal rate and 25 students. It appears the site is using a simple average (50 + 12) / 2 = 31. The real averaged W rate should be (.502 + .1225) / (2+25) = 15%.

I believe this is related to #16

cobookman commented 9 years ago

@jeffchenoweth It'd be fairly triv to make the averages waited. I'm guessing that's what your asking for.

jeffchenoweth commented 9 years ago

Yes, when adding multiple populations, to get the new combined population distribution you must take the weighted average of the underlying distributions. Adding the percentages from each population and dividing by the number of populations has no meaning.

From line 20 and 21 in course.php:

$raw = $db->pdo->prepare("SELECT profID, Prof, Size, ROUND(AVG(GPA),2), ROUND(AVG(A)), ROUND(AVG(B)), ROUND(AVG(C)), ROUND(AVG(D)), ROUND(AVG(F)), ROUND(AVG(W)) FROM Data WHERE courseID = :id AND GPA != 0 GROUP BY Prof");
$avg = $db->pdo->prepare("SELECT ROUND(AVG(GPA),2), ROUND(AVG(A)), ROUND(AVG(B)), ROUND(AVG(C)), ROUND(AVG(D)), ROUND(AVG(F)) FROM Data WHERE courseID = :id AND GPA != 0"); 

If columns A, B, C, D, F, and W are percentages, taking the average of each column does not yield the A, B, C, D, F, and W distribution of the combined population, so AVG() cannot be used.

Similarly, GPA is a population statistic that must be recalculated for the combined population. Simply averaging this column does not work.

cobookman commented 9 years ago

@jeffchenoweth Ahh, I just remembered now why this is painful. We didn't have the class size of each course due to FERPA. All that was given is the approx size E.g: Large (31-49 Students).

So we could "weight" it somewhat by small, medium, large, xlarge. But that's about it.

Still a trivial fix if you'd like it.

jeffchenoweth commented 9 years ago

I think that weighting would be more more accurate, but it's a switch someone on the Course Critique committee should make. Surprised it's a FERPA issue. OSCAR preserves the actual class size so recreating that data is trivial, but neither you nor I make those rules.