Contextualize attendance information

ddohler commented 12 years ago

Tony suggested that we provide not only an absolute percentage, but also an indication of the MP's relative rank alongside attendance numbers. For example: "Attendance: 85% (this MP has attended an average number of votes)"

If we want to do this, I suggest the following:

Calculate average, standard deviation of MPs' attendance rates.
Display the following text for these values of an MP's attendance rate:
- Average out to +/- 1 sigma: "this MP has attended an average number of votes"
- +/- 1 sigma out to +/- 2 sigma: "this MP has attended an above average / a below average number of votes"
- Beyond +/- 2 sigma: "this MP has attended a significantly above average / below average number of votes"

Thoughts? @TamaraCIPDD

EkaR commented 12 years ago

If I may present my opinion, that would simply be restating what the numbers already say: that 100% is good and 20% is bad. I think it would not be interesting for our users.

Have you considered ranking MPs? The ones with best attendance and worst voting attendance. One of our users suggested that.

ddohler commented 12 years ago

I don't think this is restating what the numbers already say; what it does is give an approximate rank, without giving an exact rank. For example, if the average attendance rate is 90% with a standard deviation of 2%, then you would see that someone with an 80% attendance rate has a "significantly below average" attendance rate, even though 80% actually looks pretty good when you view it by itself. So this is a way of ranking MPs, without giving an exact rank.

Tony strongly suggested that we avoid giving exact ranks unless we are very sure that we want MPs competing to improve their rank. MySociety used to do this with a number of statistics on TheyWorkForYou, but they found out that it gave MPs incentives to do silly things simply to boost their TWFY rankings. So while we may want MPs who are doing a very poor job of voting to improve their performance, we don't necessarily want the MPs who are already doing an average or good job to start fighting with each other to improve their rank, because it might distract them from other important things that are part of the job too.

EkaR commented 12 years ago

We have a lot of MPs with very low attendance so I am guessing our 80% will be above average and 20% is going to be below. Besides, if 80% really was below average compared to others, would it be nice/fair of us to say that? 80% is still good.

I also don't think it would be a good idea to compare them for that and another reason that now since there is nothing to be done to improve their attendance. In the future though we should approach them and explain that we are doing this (maybe this will encourage them to hit one of the buttons instead of just sitting around).

ddohler commented 12 years ago

Okay, we had some more internal discussion on this, and I think we're going to do it. Before we get too deep into it, @sebastiantransparency could you let us know what the average and standard deviation actually are?

sebastiantransparency commented 12 years ago

avg: 579.6, avg deviation: 174.9, std deviation: 16.5 . i removed the representatives with 0 attendance from the set

sebastiantransparency commented 12 years ago

i should add some interpretation of that data: the standard deviation seems relatively low to me, so i guess we can use average deviation == 2 * sigma. this way we would have everybody with sigma = 87.5 = 175/2. 580 < attendance < 667.5 = 580 + sigma; as average, 667.5 < attendance < 755 = 667.5 + sigma; as above average and 755 < attendance < 791 (max) as significantly above average.

and the same for the below average values.

i would implement this as one field with 5 choices. these choices would have labels like 'above average', 'average', etc. and are choosable by an editor when editing the representative -> in case an editor wants to change the classification. the field would also be filled automatically when running the command _updateattendance (which also calculates deviations and attendance records from the voting records)

what do you reckon?

ddohler commented 12 years ago

Here are my reactions / questions:

I think we should include all MPs (as opposed to Representatives) even if their attendance is 0.
I don't think we need to do anything to the standard deviation to compensate for the tight range. If it's a tight range and most people fall within one standard deviation of average, that will just make the outliers stand out more.
I'm a little confused about the units for the numbers you're providing; are the units "total votes attended"? I'm not sure it makes sense to calculate it this way, because some MPs who have been around for two terms have a much higher potential maximum than MPs who have only be around for one term. So I think it would be better to calculate percentage attendance first and then calculate the standard deviation and average of the percentages.
As far as implementation, my preference would be to assign an automatically calculated score to each MP which would (I think) be something like ((num votes attended) - (avg. votes attended)) / (std. dev), and then do the "average/above average/significantly above average" in the display logic. I think that the attendance numbers are going to be the main place where we get pushback from the MPs themselves, so I think it will be useful if we can show them a single number that the formula outputs, that they can check themselves.

sebastiantransparency commented 12 years ago

alright, include the 0, too (and the numbers are for MPs only)
yes, it's the absolute number of votes. i didn't think about it might include several terms, percentage would be better, indeed. the percentage is available already, too.
i have no clue about statistics, so i will use your formula and come back with new numbers in a few.

sebastiantransparency commented 12 years ago

Average: 72.6, Average deviation: 23.1, Standard deviation: 2.4 (of the percentages)

sebastiantransparency commented 12 years ago

hm, calculating the value of ((num votes attended) - (avg. votes attended)) / (std. dev) for each MP yields values ranging from -30 to +11. so maybe the thresholds for the classification should be signifantly below < -10 < below < -5 < average < 5 < above < 10 < significantly above ?

ddohler commented 12 years ago

Wow, 2.4 standard deviation is really tight. Sorry, I (sort of) gave you the wrong formula before; that was based on the absolute vote numbers you were using, but if we use percentages, then it should be ((percentage) - (avg. percentage)) / (std. dev of percentage)

Could you send me a list of the attendance percentages (or is it easy to get from the API?)? I'd like to play with the numbers a bit and see what looks reasonable. But for now, I'm still thinking to stick with +/- 2.4 = average, +/-4.8 = above/below average, and anything farther out is significantly above/below.

sebastiantransparency commented 12 years ago

the range values provided above are for the percentages. currently, you can only query the API for these values like this:

http://shenmartav.ge/api/v1/representative/?format=json&limit=20&offset=100

(it seems the limit value is arbitrary, looks like a possibility to down the server)

in the result, look for the field attendance_record to find the percentage (now that we know more about how we want to use attendance records, it might be worth refactoring btw). i will send you a list by email shortly.

ddohler commented 12 years ago

Okay, I've looked at the data, and it doesn't seem anywhere close to a normal distribution, so using standard deviation isn't really going to tell us much because it mainly applies to normal distributions. My spreadsheet calculates the standard deviation to be about 28, so going out even 1 standard deviation would get us to 100%, because the data is clumped toward the high end.

So instead, I propose to use deciles: rank the percentages lowest - highest, and then divide into 10 equally-sized groups. Each group will be assigned labels: Group 0: "This MP has attended a very low number of votes, compared to other MPs" Groups 1, 2: "This MP has attended a low number of votes, compared to other MPs" Groups 3, 4, 5, 6: "This MP has attended an ordinary number of votes, compared to other MPs" Groups 7, 8: "This MP has attended a high number of votes, compared to other MPs" Group 9: "This MP has attended a very high number of votes, compared to other MPs"

Practically, what this means is that the top 13.7 MPs will get "very high" rank, then the 27.4 below them will get "high" rank, then the next 54.8 MPs will get "ordinary" rank, the next 27.4 will get "low" rank, and finally the bottom 13.7 will get "very low" ranks.

As far as dealing with the rounding issue because the number of MPs doesn't divide evenly by ten, I'm not sure there's a perfect way to handle it, but I think pre-calculating the array indexes as floats, and then rounding to integers through some sane method will probably give decent results.

sebastiantransparency commented 12 years ago

done

ddohler commented 12 years ago

Looks sweet. Some surprises, too -- 90% attendance isn't all that special, which means that on the whole, MPs have pretty good attendance.

sebastiantransparency commented 12 years ago

indeed, you need like > 97% to be in the top group.

tigeorgia / shenmartav

Contextualize attendance information #80