I asked students from a PhD course to look at the p-value app and they were very confused by the p-hacking tool. I am sharing their feedback in the hope it helps with the development. I think the confusion stems from trying to achieve two separate demonstrations.
the problem of selective reporting: if only 'statistically significant' studies get published, the average effect size of those studies is biased upwards. This bias is more acute if the sample size per group is low (parameter 1) or if Cohen's d is close to zero (parameter 2).
The p-hacking tool (a misnomer?) is used to investigate the effect of sequential testing under which data are added until the study achieves statistical significance. In this instance, the bias is due to lack of acknowledgement of the stopping rule. The p-value will decrease as sample size increases, but the average effect size would be weighted by sample size (I arguably did not check if it was the case).
My students were confused by the reported sample size of the studies for p-hacking (which is that of the largest study): it wasn't clear what it meant.
The values of Effect size (true) and Effect size (sim) were rounded to one digits, while Cohen's d can be modified in increments of 0.01.
With a large number of draws and large number of samples, the app lags.
I asked students from a PhD course to look at the p-value app and they were very confused by the p-hacking tool. I am sharing their feedback in the hope it helps with the development. I think the confusion stems from trying to achieve two separate demonstrations.
Effect size (true)
andEffect size (sim)
were rounded to one digits, while Cohen's d can be modified in increments of 0.01.