martzcodes / gt-course-surveys

Helping students assess course difficulty and workload.
https://omscentral.com
36 stars 11 forks source link

New Average Misleading #29

Closed cmwedge closed 8 years ago

cmwedge commented 8 years ago

The new score/averaging algorithm is misleading. I initially intended to phrase this as a bug report due to the sudden change, but found mention in the Help section of the new algorithm. As a user, a normal average is typical and expected, especially considering historical functionality. Given the current state, without a close look, I would infer that, for example, Big Data is a 4, when literally every review has been a 5 thus far. Converging to a 5 is virtually impossible assuming a reasonable number of reviews over time; even with 150 5-difficulty reviews, a course would still rate as a 4.75.

cgearhart commented 8 years ago

It is not clear how this ranking method is misleading. Difficulty rating is a relative measure to compare classes, so a "very hard" class doesn't ever need to reach an average rating of 5 for users to see that it is a hard class. In your example, it is still clear from the new ranking that BD4H is very difficult because it currently ranks 3rd alongside well-known hard classes like ML and CCA, and above other difficult courses like AOS, IHPC, and CV; as more reviews come in confirming the difficulty, it will continue to rise in the rankings. And although 100% of the ratings have been "very hard" so far, ML & CCA each have about twice as many reviews at the same level "very hard".

Using the Bayesian average is mathematically well-founded and offers some benefits over ranking by the sample mean (the "normal" average). As discussed in Issue #28 where this change was first proposed, the sample mean overstates the importance of extreme ratings for courses with few reviews. Using the Bayesian average is a simple method to stabilize the estimated difficulty ratings by incorporating a measure of confidence accounting for variations in the numbers of reviews for each course.

cmwedge commented 8 years ago

While the Bayesian average may be more mathematically sound, it does not align with user expectations. The column is labeled Average Difficulty; to a typical user, this means a simple average (and this has historically been the case in this application). As I mentioned previously, I initially thought there to be a bug in the latest version - surely the average of all 5s would be 5. The Big Data class is interesting because it is such an extreme outlier. While over time, as the number of reviews increases, the score may fix itself, in the interim you're left with students potentially being unaware of just how wide the gap is between, say BD and CCA or ML. That is worse, to me, than a class with only a handful of reviews giving an inaccurate picture, because it is significantly more obvious to users.

Although perhaps an issue for a different thread, a bigger problem is that old reviews are weighted the same as new reviews, despite courses evolving (in some cases quite drastically) over time.

cgearhart commented 8 years ago

So is the proposed fix just to change the name of the column to something other than Avg. Difficulty? (Technically, it is still an average...but user experience trumps pedantry.) Perhaps "Difficulty Score"?

There are always more sophisticated models that can be applied to calculating this kind of statistic. The primary benefit of the Bayesian average in this case is that it is very easy to implement in order to achieve the desired effect. Calibrating rating weight based on a model of pairwise rankings between courses for the same user, discounting older reviews, etc., would all add fidelity to the ranking, but I think you're right that those kind of proposals should be handled in a separate thread.

cmwedge commented 8 years ago

I think an improved label would go a long ways. The label could be maybe "Computed Difficulty" or something. Perhaps also/instead one of those little i icons that describes or links to the actual calculation.

mehmetbajin commented 8 years ago

I just rolled back the changes. The fact that the change was disputed within 24 hours of introduction is a testament to the fact that the cost trumps the value added.

cgearhart commented 8 years ago

Ah, too bad. Thanks for trying it out.

cmwedge commented 8 years ago

Is it possible to have both, or a toggle? It has value, it's just not immediately obvious what's going on.

mehmetbajin commented 8 years ago

It could be added as a setting in the user profile. @cgearhart would you be up for it?

cgearhart commented 8 years ago

Sure, I could add that. Just let me know.

mehmetbajin commented 8 years ago

@cgearhart Awesome!

We will want to do this in three stages by updating...

  1. ...data model (rules.yaml) to account for the new configuration option within the user entity.
  2. ...database to set the new option to a default of "arithmetic mean".
  3. ...app code to allow configuring the new option and to respect it.

What do you think about exposing the option as a dropdown-list of options for how to average course difficulty? For now, we would just have two options: { "Arithmetic Mean", "Bayesian" }. Inline help text that explains the differences would be nice too.

Notes: