One idea for this module is to have a couple of parts:
Why were are not teaching stats
How do to good exploratory data analysis
For point 1., some things to say:
Statistics is hard, very hard. We like to simplify it with recipes, but that's just fooling ourselves with an illusion that it's simpler than it really is. It is not.
"statistical significance" is a vague term (why p < 0.05 or p < 0.01? What if p = 0.055?)
"statistical significance" is not synonymous with "biological significance" - need to focus more on effect sizes
"non-significance is a myth" (almost by definition, two things being exactly the same is impossible... if nothing else, there's imprecision in measurements)
There are more technical points, and we need good examples to illustrate these things:
p-value is the probability of observing our data under the assumption of no effect. It is not the probability of no effect.
multiple testing is a problem
if you sample enough, any statistical test you do will be "significant" - no matter what the magnitude of the effect is
What is a sample? Or, the importance of independence of measurements (for example, if you image two sections from 10 plants, you don't have 20 independent data points - images from the same plant might be correlated with each other, thus they are not independent observations)
So, if we dismiss teaching statistical inference on the grounds that p-values are vague and don't inform us about what we are interested in, what is the solution?
Bayesian statistics - but this is hard... (so we won't teach it, because we - or at least I - don't know enough about it!)
One idea for this module is to have a couple of parts:
For point 1., some things to say:
p < 0.05
orp < 0.01
? What ifp = 0.055
?)There are more technical points, and we need good examples to illustrate these things:
So, if we dismiss teaching statistical inference on the grounds that p-values are vague and don't inform us about what we are interested in, what is the solution?