Enhancement Proposal: Sort by Bayesian Average

cgearhart commented 8 years ago

Sorting by average over-emphasizes extreme ratings in courses with few reviews. One common correction for multinomial distributions is to use the Bayesian Average. Using the Bayesian average would better capture the certainty of course difficulty ratings reported in the tool.

In this case, the corrected rating is based on the formula:

R_avg(course) = (C*m + sum(difficulty|course)) / (N + C)

m is a constant prior expectation for the rating, C is a constant number of reviews biasing the estimate, and N is the number of reviews per course. The correction discounts the volatility of the average difficulty when there are few reviews, and the constant difference is discounted as the number of reviews increases.

Using the current data, reasonable values can be chosen as C=25 (the median of the number of reviews for all classes currently in the database), and m=3.25 (the cumulative average difficulty rating). These can be treated as constants, and are unlikely to need changing in the future.

mehmetbajin commented 8 years ago

@cgearhart How different are the results when sorting based on this new method instead of the simple average? You can access the raw data here:

https://gt-surveyor.firebaseio.com/reviews.json

cgearhart commented 8 years ago

Results of the proposed change for the current data set are shown below. Under this model, ratings for new classes will be biased towards an expected review of 3.25 until the number of reviews becomes statistically significant, at which point the actual data begins dominating the calculation again. The data below shows the results for m=3.25 and C=15 or C=25.

My initial recommendation of C=25 is based on the fact that 25 is small compared to the long term expectation for the number of reviews each class will receive. Selecting a smaller parameter like 15 makes the rating more responsive to user reviews with a small number of reviews in the system (C=0 is identical to the arithmetic mean currently in use).

C = 25

Course	Rating	Change
6505	4.191	+2
7641	4.043	+3
8803-BDHI	3.933	-2
6220	3.787	-2
6210	3.702	+3
6476	3.677	+0
6310	3.665	+2
6601	3.596	-4
8803-002	3.566	+2
6290	3.473	+0
8803-003	3.379	+1
7637	3.342	+1
8803-004	3.306	-6
6340	3.236	+0
6460	3.202	+0
6400	2.914	+7
6475	2.864	-1
8803-001	2.798	-1
6035	2.730	+1
7646	2.678	+1
6440	2.658	+1
6300	2.612	-4
6250	2.435	-4

C = 15

Course	Rating	Change
6505	4.325	+2
8803-BDHI	4.153	-1
7641	4.132	+2
6220	3.960	-2
6210	3.788	+3
6476	3.766	+0
6310	3.750	+2
6601	3.740	-4
8803-002	3.611	+2
6290	3.534	+0
8803-003	3.430	+1
7637	3.353	+1
8803-004	3.338	-6
6340	3.230	+0
6460	3.179	+0
6475	2.782	+0
6400	2.761	+6
8803-001	2.720	-1
6035	2.604	+1
6300	2.545	-2
7646	2.542	+0
6440	2.506	+0
6250	2.374	-4

mehmetbajin commented 8 years ago

I worry about two things:

This new ranking algorithm and what it means is not immediately obvious, even with inline help. I had to read and reread the description until it sank in, at which point I am on board with the value added, but there is a barrier to this appreciation.
The results of applying the new algorithm appear to not be that different from the alternative at least based on the raw values themselves. In other words, the new algorithm does not appear to provide insights new and valuable enough to offset the cost of #1 above.

Thoughts?

cmeury commented 8 years ago

Does the end-user care so much about which ranking algorithm is used? Don't we as "designers" need to choose the optimal, statistically most meaningful one?

mehmetbajin commented 8 years ago

Yes, but I would argue the end-user should not be kept in the dark as to how something works or worse, misled, especially if we continue to label it "Avg. Difficulty" or "Avg. Workload" when in fact it is not :)

cgearhart commented 8 years ago

The results of applying the new algorithm appear to not be that different from the alternative

They shouldn't be significantly different for any class that already has a lot of reviews. The courses affected will be the ones that have relatively few reviews. This change would make the rankings more stable as new courses continue being released, and would make the rating of each class more stable as data is collected through more reviews.

"Avg. Difficulty" or "Avg. Workload" when in fact it is not

It is an average. The value currently in use is the maximum-likelihood estimate of the arithmetic mean, which is what most people think of when you say "average". The proposed change would instead use the "bayesian estimate of the arithmetic mean" - it is still accurate to call it the average. I think that providing an FAQ bullet showing the formula in the initial proposal and the values of m and C in use would be sufficiently clear, and you could use "Bayesian estimate of the arithmetic mean" (with a link or reference to the FAQ) as the inline help.

mehmetbajin commented 8 years ago

OK, I'm sold! :)

Would you be able to add this in? If so, I'll need your bitbucket username to grant you access to the repo.

cgearhart commented 8 years ago

Sure - I have the same name on bitbucket and github: cgearhart. https://bitbucket.org/cgearhart/

mehmetbajin commented 8 years ago

Just pushed to production. Thank you for your help @cgearhart!

martzcodes / gt-course-surveys

Enhancement Proposal: Sort by Bayesian Average #28