marcpabst / ANOVA.jl

Provides a Simple Way to Calculate ANOVAs From Fitted Linear Models.
Other
21 stars 9 forks source link

What are ANOVA types I-III? #8

Closed epogrebnyak closed 5 years ago

epogrebnyak commented 5 years ago

Perhaps a bit more references in the readme could help new users, some suggestions are here. I'm not sure what are types I, II, III of ANOVA? What is the source of this terminology?

marcpabst commented 5 years ago

As far as I know, the terminology originates from the early days of computing, when computing time was expensive and certain "tricks" were used to calculate ANOVAs. A little bit of background can be found here.

A good summary on the different types of ANOVA can be found here: https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/.

epogrebnyak commented 5 years ago

Thank you for the links! The Exegeses on Linear Models by W Venables (1998) is really a great new read and provides more understanding of R roots.

As the immediate question is answered I close the issue, but propose any of the following at maintainer's discretion:

https://github.com/marcpabst/ANOVA.jl/blob/ecfec4fbe771cb78ab49f9a0bb4c4eb2a82110fa/src/ANOVA.jl#L85-L122

marcpabst commented 5 years ago

I was quite busy over the last few months and didn't really have time to work on this package, apart from fixing a few things Julia 0.7/1.0 broke. You're quite right, it could definitely use some work. I created this package because I wanted to familiarize myself with Julia and there was no easy way to calculate an ANOVA table. So creating this seemed like a good starting point. It works ok for simple models (meaning that it gives the same results as other statistical software packages out there) but I certainly would like to add an easier interface - especially for repeated-measures types of ANOVA because they are so widely used.

I also agree with adding more information to the README, we should also add some links to ressources concerning the discussion about if calculating ANOVA is actually good scientific practice or not (there has been some debate in the community).

Anyway, I'm grateful for any PRs, preferably by people who have a better understanding of the underlying statistics than I have.

On a related note, there has been a very short discussion a few months back about how an ANOVA-function should look like - but in the end, I decided that it would make the most sense to just create a glorified print function for (linear) models because that's what an ANOVA basically is. As far as I can remember, I played with the idea to create something like AnovaDataFrameRegressionModelthat could be passed to fit() but it just seemed to be a little over-the-top.

Oh, and another short note on code style: I'm open for any proposals to make the code more "Julia-esque" - this code is actually based on the style used by the corresponding R package...

epogrebnyak commented 5 years ago

I support the idea of anova as a view on regression results, even though instruction in non-econometrics disciplines treats anova as a stand-alone method, hiding the internal linear model.

this code is actually based on the style used by the corresponding R package...

can you please point which one? is it anova.glm()?

Also wanted to mention reference datasets, maybe useful for testing: https://www.itl.nist.gov/div898/strd/anova/anova.html