microsoft / wpa

R package for analyzing and visualizing data from Microsoft Workplace Analytics
https://microsoft.github.io/wpa
Other
30 stars 10 forks source link

significance_report: Add statistical significance tests to core metrics to make some claim about a population value #121

Open juliajuju93 opened 3 years ago

juliajuju93 commented 3 years ago

Baseline: The core functions within the wpa package currently generate visuals and summary tables. As data interpretation still depends on the analyst. Next, we could look at about any hypothesized relationship within our dataset.

Idea: Significance tests give us a formal process for using sample data to evaluate the likelihood of some claim about a population value. We calculate p-values to see how likely a sample result is to occur by random chance, and we use p-values to make conclusions about hypotheses.

For example: We have seen that the organization Sales shows more After-hour-collaboration than all the other organizations. We could suspect that Sales has a higher mean of after-hour-collaboration than the rest of the organization. But do we really have the evidence that the overall mean for the Sales org is higher? This proposition is known as a 'null hypothesis', since it usually relates to there being 'no difference' between groups'. With a test of significance we could provide evidence that there is a significant difference between those groups or not.

Outcome:

Disclaimer: This is just an idea :)

martinctc commented 3 years ago

Thanks @juliajuju93 for this - we have a function called p_test() that performs a similar test on the data:

https://microsoft.github.io/wpa/reference/p_test.html

We should also consider whether it makes more sense to interweave the tests to the existing reports, or whether they should be a stand-alone report. Would weaving these tests in help your case?

Tagging @m-m-powers who had a similar idea on this previously!

juliajuju93 commented 3 years ago

Thanks for pointing this function out. I could image a report with some kind of explanation why to do it and what to look out for to be beneficial

martinctc commented 3 years ago

The other alternative is to incorporate this into some of our existing functions, like create_rank(). In the table return result, each row can be tested against the group average to indicate whether the differences are statistically significant.

This can be reused if we create a report like that focussed on significance testing.

moralec commented 3 years ago

I really like this idea. I think the create_rank function could work as a good starting point to do this, with a view of extending this to create_bar.

I suggest we address this in combination with: