marcpabst / ANOVA.jl

Provides a Simple Way to Calculate ANOVAs From Fitted Linear Models.
Other
21 stars 9 forks source link

Interface without GLM #25

Open KronosTheLate opened 2 years ago

KronosTheLate commented 2 years ago

I still do not understand how to use GLM to do ANOVA. Nor do I understand what EffectsCoding() is, or what contrasts do in a call to fit.

So below is something I made - a simple interface to take a number of vectors as inputs, and preform anova on them, without making a GLM. My proposition is to include this interface in the package, for users like myself:

using PrettyTables, Statistics, Distributions
SS(inputs::AbstractArray) = sum((i - mean(inputs))^2 for i in inputs)
SSW(all_obs::AbstractVector...) = sum(SS.(all_obs))
SSB(all_obs::AbstractVector...) = sum(length(ys) * (mean(ys) - mean(vcat(all_obs...)))^2 for ys in all_obs)
function SST(all_obs::AbstractVector...)
    ys = vcat(all_obs...)
    sum((y - mean(ys))^2 for y in ys)
end

function anova1(all_obs_input::AbstractVector...)
    all_obs = all_obs_input .|> skipmissing .|> collect
    observations = vcat(all_obs...)
    SS = [SSB(all_obs...), SSW(all_obs...), SST(all_obs...)]
    df = [length(all_obs) - 1, length(observations) - length(all_obs), length(observations) - 1]
    S² = [SS[i] / df[i] for i = 1:3]
    TS = round(S²[1] / S²[2], digits = 5)
    dist = FDist(df[1], df[2])
    p_val = round(1 - cdf(dist, TS[1]), digits = 5)
    table = pretty_table(hcat([:Between, :Within, :Total], SS, df, S²),
        header = ["Variation", "SS", "df", "S²"],
        alignment = :c)
    println("Test statistic = $(TS)")
    println("p-value        = $(p_val)")
    return table
end
v1 = [missing, 4, 2, 4]
v2 = [-1, 0, missing, 1]
v3 = [-2, 1, 1, missing]

anova1(v1, v2, v3)
#returns
┌───────────┬─────────┬────┬─────────┐
│ Variation │   SS    │ df │   S²    │
├───────────┼─────────┼────┼─────────┤
│  Between  │ 22.2222 │ 2  │ 11.1111 │
│  Within   │ 10.6667 │ 6  │ 1.77778 │
│   Total   │ 32.8889 │ 8  │ 4.11111 │
└───────────┴─────────┴────┴─────────┘
Test statistic = 6.25   
p-value        = 0.03411

The following can be added to allow matrix inputs, similar to the API i encountered in MATLAB.

anova1(all_obs_input::AbstractMatrix) = anova1(Vector.(eachcol(all_obs_input))...)
anova1([v1 v2 v3])

If this is to be THE package for ANOVA in Julia, I think that an API without using GLM should be part of it, because a) it is simpler to use, and does not require understanding of GLM b) One can use GLM directly, should one already be fluent in its use, to do ANOVA. This package should have a low bar of entry - it is currently too high for me.

stanlazic commented 2 years ago

Both ANOVA and regression are a linear model (LM), which is a special case of a GLM. If you're modelling the outcome variable as normally distributed (the default for the GLM function), then you're using a LM. After a LM is fit, one can examine the coefficients as in a regression analysis, or examine how the variance is partitioned as in an ANOVA analysis. I've always disliked how many stats courses treat regression and ANOVA as separate.

I think this package needs more documentation and examples instead of another function. I plan on converting the R code from my book (https://github.com/stanlazic/labstats) into Julia, so that may help lower the bar for entry.