marcpabst / ANOVA.jl

Provides a Simple Way to Calculate ANOVAs From Fitted Linear Models.
Other
21 stars 9 forks source link

ANOVA for LinearMixedModels #22

Open Nosferican opened 4 years ago

Nosferican commented 4 years ago

@pharmcat has implemented ANOVA type III for LinearMixedModels in https://github.com/PharmCat/ReplicateBE.jl. Would it possible to have the functionality ported to this package?

PharmCat commented 4 years ago

Hi! Thank you very much for mentioning the package. There are some things that I should describe. RepliceteBE.jl was developed because MixedModels.jl have no opportunity to get solution for bioequivalence as FDA want to get - split T/R variation with covariation (FA0(2) or CSH). And as I know, developers not planing to include repeated covariance matrix structures. So, ReplicateBE solving mixed models with forward REML function optimization with inverting covariance matrix, and all derivatives can be obtained. Type III table is calculated as multidimensional contrast for each factor (contrast for all levels of factor). (You can see Type 3 Tests of Fixed Effects in SAS MIXED Manual p how it can be done, p. 6136, 6154) If MixedModels.jl can provide all derivatives for REML function and gradient of variance-covariance matrix of fixed effects same thing can be done.

Nosferican commented 4 years ago

cc: @dmbates, @palday, @vjd We could define an API which can have those as accessors. Ideally it would just use the API for StatisticalModel/RegressionModel and that would give it support for GLM, MixedModels and any other that implements the API.

I can't say for sure how feasible some of those are based on the actual estimation techniques used in MixedModels which target a wide variety of scenarios but might not be as general to cover certain cases (i.e., without loss of computational efficiency or code refactoring).

I believe for fitting bioequivalence models with ML, the likelihood ratio test should be the most powerful. Type III residuals (not that I would argue for that choice given one) could be computed by changing the covariates in the various models. For REML, the ANOVA test becomes more salient as the likelihood test would no longer be valid under REML and different fixed effects.

I did notice the license on ReplicateBE.jl was GPL-3.0 which would not allow us to port relevant code to this packages (MIT). Would you be willing to relicense ReplicateBE.jl to a compatible license or give explicit permission for the type III ANOVA code? You could also port the relevant aspects since you are the copyholder as well.

For my own understanding, the residual sums could be computed by fitting the various models based on the residual sum (e.g., (A | B, C, B & C)).

palday commented 4 years ago

Part of the problem for the conditional F-tests in ANOVA is that it the denominator degrees of freedom is somewhat of an open question (and thus draws the Kenward-Roger crowd, who still haven't told us how to do that approximation without inverting a large matrix). You can treat the ddf as infinite (i.e. chi-square instead of F, same as z instead of t), but this then really shows off that it's asymptotically the same as a likelihood-ratio test and so what's the advantage? Although you do correctly note that the LR test is a problem for REML models .... but then again @dmbates has convinced me that the motivation for REML over ML isn't great.

Also, I want to take the time to channel @dmbates and note that just because other popular software does something doesn't mean that it's a good thing. I'm thinking of the Exegesis on Linear Models commentary on whether Type-III tests make sense, especially if we don't enforce orthogonal contrasts.

Regarding an API: There is a proposal over on StatsModels for an ftest. @nalimilan would know more.

All of this was very negative, sorry. It just managed to hit on two popular yet often questionable practices (Type-III sums of squares and the Kenward-Roger ddf approximation) at once. There are occasions where these things have a use, but I worry about "giving the neighborhood kids the key to your F-16".

Nosferican commented 4 years ago

Aye. In bioequivalence analysis Kenward-Roger is pretty popular because the designs are guaranteed to be small (i.e., you can invert the matrix since it is not large; domain specific). @palday is the current dof method Containment Degrees of Freedom Approximation or do we not perform that adjustment? I would argue that ML should probably be the default but REML is still used when the designs are unbalanced which makes FDA really insistent in using the type III (i.e., not even type II).

nalimilan commented 4 years ago

Regarding an API: There is a proposal over on StatsModels for an ftest. @nalimilan would know more.

ftest already exists in GLM. If it's useful for other models, we could move it to StatsModels (or a new StatsModelsExtras package), together with lrtest (https://github.com/JuliaStats/StatsModels.jl/pull/162).

Nosferican commented 4 years ago

@kleinschmidt Could we make a PR to add ftest/lrtest at StatsModels (just the namespace and packages can import it)? I defer on whether we should standardize a LRTest or such struct à la CoefTable.

I do prefer waldtest better than ftest since ftest is just a specific case when the IM has a certain form.

There's the unexported MixedModels.likelihoodratiotest but that could be an alias with lrtest.

palday commented 4 years ago

I think all of this is pointing more and more towards a StatsModelsExtras to host all these tests that require Distributions.

marcpabst commented 4 years ago

Hi everyone, sorry for not participating in this discussion. Is there still interesting in having an ANOVA.jl package?

I created this a while ago because people asked me "how can I run ANOVAs in Julia?" and at that time, there was no good (easy) answer. I know that ANOVAs are often of questionable nature (especially the whole sums of squares thing, which basically no ones using ANOVAs understands),but the fact is, they are heavily used in a lot of fields.

If we decide to keep this package around, a major revamp is necessary, I suppose. (Or, maybe another package implements it by now (?), which would be great!)

palday commented 4 years ago

I think ANOVA provides a convenient summary at times -- and can be a special case of a more general testing of linear hypotheses. I'm thinking something along the lines of the car or afex packages in R, which can provide such tests for lme4 objects.

marcpabst commented 4 years ago

I absolutely agree, but my fear would be that we end up with the same situation R is currently in. You have so many packages that have considerable overlap in their functionality (often with subtle differences in implementation) to a point that nobody really knows which package they should use. Also, interoperability is not always great within the R ecosystem.

Then there are also packages like ezANOVA, which makes it easy to create ANOVAs but hides a lot of the "magic". Lots of people use methods like this although I understand that "real" statisticians have reservations about them.

Ideally, we would have a function ànova() than could be called on any MixedModel object (or any other fitted model) to produce ANOVA tables without refitting. Also, we probably should think about an easy method to produce these models, just like ezANOVA offers.

nalimilan commented 4 years ago

Yes, I agree it would be better to have this in a more central package. @kleinschmidt Do you think this could live in StatsModels? It would seem natural now that lrtest lives there. (For the record, we decided to split essential features to StatsModelBase and have extended features in StatsModels itself rather than creating StatsModelsExtras.)