Feature request: enhanced outputs for `create_rank()`

martinctc commented 3 years ago

Problem

The current create_rank() approach currently risks creating lots of really small groups that are outliers for the metric, and because of their size, might not be meaningful for a stakeholder.

Solution

To get around this, we could create an option that ranks subgroups based on a new calculated "delta", where delta equals how different would the population average be without the subgroup included. That means that big subgroups with moderately outlying metric values would get prioritized in the ranking over tiny subgroups with extreme outlying metric values.

A weight of population size could allow a stakeholder or a change management executive to target change programs based on population size. Knowing that they've selected a group ranked 5th but has a larger population could be helpful.

Notes

Above issue is abridged from a discussion with Jessalyn Uchacz and Carlos Shrimpton.

This issue is linked with the feature request in #102.

moralec commented 3 years ago

Here an example of how this could work:

| Collab Hours | N | Vs Mean | Rank | Mean without | Delta | Rank -- | -- | -- | -- | -- | -- | -- | -- Team 1 | 20.0 | 50 | 0.8x | 3 | 32.1 | - 7.1 | 3 Team 2 | 30.0 | 30 | 1.2x | 2 | 22.3 | 2.7 | 1 Team 3 | 45.0 | 5 | 1.8x | 1 | 23.8 | 1.3 | 2 Total | 25.0 | 85 | | | | |

moralec commented 3 years ago

Method is really simple:

Calculate how will the average look if you excluded that group.

This is: (TotalHours N - GroupHours n)/(N-n)

Total Hours: Average for the population Group Hours: Average for the group in scope N = Population Size n = group size

Then you can calculate the delta between the real (observed) average and the calculated one excluding that group.
Finally you can rank from highest to lowest value of delta.

microsoft / wpa