Closed snim2 closed 9 years ago
I like where this is going, but I wonder if we should allow arbitrary confidence intervals (if possible) rather than simply 90|95|99. Also, might it be a good idea to print out the warning about "less than 30" at startup as well/instead of at the end?
Happy to make those changes, but I'm not sure how best to calculate Z, in terms of a consistent coding style. Usually I would pull Z from a LUT. Is that what you would prefer to see, or is there a better way to implement this?
Just FYI I've never written a man page before, so my troff
is probably rather ropy!
The maths for z scores is beyond me, but we could provide a LUT for all the integers? That would give a pretty decent illusion of arbitrary confidence intervals. I'm not sure what the right level of precision for the LUT is, but I'm inclined to think we should be storing the z scores at more than 2 decimal places (since we're reporting times to 3 decimal places).
A quick bit of searching on Google suggests we could use a t (rather than z) score if num_runs < 30? I reserve the right not to have understood that correctly!
Yes, I was working from Georges et. al. (2007) [1] and that was the reason for issuing the warning when n<30. The alternative is to have a 2d LUTs which maps 1-99 to both the t and Z values. That seemed like overkill to me, but it isn't impossible.
I'm happy to get the z values right first, then worry about t values later.
OK, fix pushed, thanks to Paul Wilson (UoW) for providing a way to generate tables of t- and z-values in R.
This looks really good!
Thanks! Hope it conforms to your general coding style / conventions.
I've now pulled this into a branch in my repo 'confidence' (and added you as a collaborator). Let's push new commits there.
One question I don't immediately know the answer to: how many DP should the +/- print? I suspect that 6 is too many, but I'm not sure if 3 (as with all the other fields) is too little. Any thoughts?
I don't know either, but I noticed when testing that programs with a very short running time tend to give very small CI values, so it was difficult to do many quick tests with only 3dp.
How many dp do you generally use in your publications? I'd always assumed you just formatted the output of multitime, or a similar tool, in LaTeX?
We've always gone for 3DP, but I'm not claiming that's necessarily the best value. We used it because at greater DP we found that jitter tends to dominate.
So, reading http://www.nature.com/bdj/authors/guidelines/statistics.html and similar texts, it seems like the standard practice is to print sample means to "one more decimal place than was measured" and one extra decimal place for any CI or standard error that is added to the mean.
OK, so if we do measurements to 3DP, CI should be at 4DP. I'd be happy with that.
Pushed to both forks BTW
I think this is now good enough to be in master. We can tinker with it more there (possibly on other branches).
Awesome, thanks.
And branch deleted. Thanks Sarah, I'm really glad you did this!
Added -c switch to command line options. This takes the values 90, 95 or 99. Output warns if num_runs < 30. Added -c switch to man page. Added confidence intervals to output.
This could do with checking over before merging!