Use harmonic mean for average score

friedger commented 5 years ago

What is the problem you are seeing? Please describe. As discussed in https://github.com/blockstack/app-mining/issues/117#issuecomment-503745287 (by @hstove) reviewer scores contribute differently to the average score:

For many apps, you will likely find that a big part of their score came from 1 reviewer. If you do think that apps shouldn't be rewarded for doing really well with 1 reviewer,

How is this problem misaligned with goals of app mining? App developers can excel in the domain of one reviewer and ignore the others as the average score is the arithmetic mean. App developers should try to excel in all domains.

What is the explicit recommendation you’re looking to propose? Use the harmonic mean instead of the arithmetic mean: https://en.wikipedia.org/wiki/Harmonic_mean The harmonic mean requires positive values, hence, shift the numbers by the absolute value of the smallest negative number (or use just 3 or so) and then shift the result back.

Describe your long term considerations in proposing this change. Please include the ways you can predict this recommendation could go wrong and possible ways mitigate. This change should also encourage app developers to care for all reviewers and create great apps.

Additional context Harmonic mean is always smaller than the average mean (if there is at least one pair of different values). The value for shifting affects the harmonic mean.

hstove commented 5 years ago

@friedger do you have any data that supports your hypothesis that a harmonic mean will have a better end result? I would be interested in any calculations you've done, or if you've modified any past results to use a harmonic mean, and to see how that changes things.

App developers can excel in the domain of one reviewer and ignore the others as the average score is the arithmetic mean. App developers should try to excel in all domains.

I am not sure that a normal average allows you to "ignore" a single reviewer. Having a bad score in one reviewer definitely weakens your score, especially with only 4 reviewers. If you look at the top apps, they all have pretty good scores across all reviewers.

friedger commented 5 years ago

I did some calculations here for June: https://docs.google.com/spreadsheets/d/1cNqwf9DtBepLf-M51p2KdOhDdbpiH44dRkOfn5LnNEU/edit?usp=sharing

The impact is not very big but does some corrections.

The main reason for this proposal is the discussion about the different impact of the same reviewer in #117

hstove commented 4 years ago

I think it adds too much complication, when we already have a mechanism to ensure balanced scores via the "theta" function. I would personally recommend against this route. I do very much appreciate that you put together some data and shared this insight! I was not familiar with the harmonic mean.

friedger commented 4 years ago

The theta function only balances the scores between apps, not between reviewers. The theta function makes sure that the max and min values of all apps for one reviewer do not differ too much from the value of another reviewer. However, it does not balance the max and min value of a single across the reviewers.

I see that the impact of this change might be not too big.

stacks-archive / app-mining

Use harmonic mean for average score #121