Closed cgearhart closed 8 years ago
@cgearhart How different are the results when sorting based on this new method instead of the simple average? You can access the raw data here:
Results of the proposed change for the current data set are shown below. Under this model, ratings for new classes will be biased towards an expected review of 3.25 until the number of reviews becomes statistically significant, at which point the actual data begins dominating the calculation again. The data below shows the results for m=3.25 and C=15 or C=25.
My initial recommendation of C=25 is based on the fact that 25 is small compared to the long term expectation for the number of reviews each class will receive. Selecting a smaller parameter like 15 makes the rating more responsive to user reviews with a small number of reviews in the system (C=0 is identical to the arithmetic mean currently in use).
C = 25
Course | Rating | Change |
---|---|---|
6505 | 4.191 | +2 |
7641 | 4.043 | +3 |
8803-BDHI | 3.933 | -2 |
6220 | 3.787 | -2 |
6210 | 3.702 | +3 |
6476 | 3.677 | +0 |
6310 | 3.665 | +2 |
6601 | 3.596 | -4 |
8803-002 | 3.566 | +2 |
6290 | 3.473 | +0 |
8803-003 | 3.379 | +1 |
7637 | 3.342 | +1 |
8803-004 | 3.306 | -6 |
6340 | 3.236 | +0 |
6460 | 3.202 | +0 |
6400 | 2.914 | +7 |
6475 | 2.864 | -1 |
8803-001 | 2.798 | -1 |
6035 | 2.730 | +1 |
7646 | 2.678 | +1 |
6440 | 2.658 | +1 |
6300 | 2.612 | -4 |
6250 | 2.435 | -4 |
C = 15
Course | Rating | Change |
---|---|---|
6505 | 4.325 | +2 |
8803-BDHI | 4.153 | -1 |
7641 | 4.132 | +2 |
6220 | 3.960 | -2 |
6210 | 3.788 | +3 |
6476 | 3.766 | +0 |
6310 | 3.750 | +2 |
6601 | 3.740 | -4 |
8803-002 | 3.611 | +2 |
6290 | 3.534 | +0 |
8803-003 | 3.430 | +1 |
7637 | 3.353 | +1 |
8803-004 | 3.338 | -6 |
6340 | 3.230 | +0 |
6460 | 3.179 | +0 |
6475 | 2.782 | +0 |
6400 | 2.761 | +6 |
8803-001 | 2.720 | -1 |
6035 | 2.604 | +1 |
6300 | 2.545 | -2 |
7646 | 2.542 | +0 |
6440 | 2.506 | +0 |
6250 | 2.374 | -4 |
I worry about two things:
Thoughts?
Does the end-user care so much about which ranking algorithm is used? Don't we as "designers" need to choose the optimal, statistically most meaningful one?
Yes, but I would argue the end-user should not be kept in the dark as to how something works or worse, misled, especially if we continue to label it "Avg. Difficulty" or "Avg. Workload" when in fact it is not :)
The results of applying the new algorithm appear to not be that different from the alternative
They shouldn't be significantly different for any class that already has a lot of reviews. The courses affected will be the ones that have relatively few reviews. This change would make the rankings more stable as new courses continue being released, and would make the rating of each class more stable as data is collected through more reviews.
"Avg. Difficulty" or "Avg. Workload" when in fact it is not
It is an average. The value currently in use is the maximum-likelihood estimate of the arithmetic mean, which is what most people think of when you say "average". The proposed change would instead use the "bayesian estimate of the arithmetic mean" - it is still accurate to call it the average. I think that providing an FAQ bullet showing the formula in the initial proposal and the values of m and C in use would be sufficiently clear, and you could use "Bayesian estimate of the arithmetic mean" (with a link or reference to the FAQ) as the inline help.
OK, I'm sold! :)
Would you be able to add this in? If so, I'll need your bitbucket username to grant you access to the repo.
Sure - I have the same name on bitbucket and github: cgearhart. https://bitbucket.org/cgearhart/
Just pushed to production. Thank you for your help @cgearhart!
Sorting by average over-emphasizes extreme ratings in courses with few reviews. One common correction for multinomial distributions is to use the Bayesian Average. Using the Bayesian average would better capture the certainty of course difficulty ratings reported in the tool.
In this case, the corrected rating is based on the formula:
R_avg(course) = (C*m + sum(difficulty|course)) / (N + C)
m is a constant prior expectation for the rating, C is a constant number of reviews biasing the estimate, and N is the number of reviews per course. The correction discounts the volatility of the average difficulty when there are few reviews, and the constant difference is discounted as the number of reviews increases.
Using the current data, reasonable values can be chosen as C=25 (the median of the number of reviews for all classes currently in the database), and m=3.25 (the cumulative average difficulty rating). These can be treated as constants, and are unlikely to need changing in the future.