raphaelvallat / pingouin

Statistical package in Python based on Pandas
https://pingouin-stats.org/
GNU General Public License v3.0
1.6k stars 137 forks source link

How to check n-way ANOVA assumptions using pingouin? #356

Closed vabatista closed 1 year ago

vabatista commented 1 year ago

Suppose I'm doing this experimental design with 3 factors, 2 levels each and 3 repetitions:

import pingouin as pg
import numpy as np
import pandas as pd
from itertools import product

y_measures = np.array([28,25,27,18,19,23,36,32,32,31,30,29,28,25,22,18,19,23,12,32,40,31,30,29])
factors = ["Facotr A" ,"Factor B", "Factor C"]
levels_list = [['low','high'],['low','high'],['low','high']]
replicates = 3

def generate_dataframe(measures, factors, levels_list, replicates):

    lines = []
    for factor_combination in product(*levels_list):
        line = {}
        for idx, factor in enumerate(factors):
            line[factor] = factor_combination[idx]
        for k in range(replicates):
            lines.append(line)
    df = pd.DataFrame(lines,columns=factors)
    df['y'] = measures
    return df

df = generate_dataframe(y_measures, factors, levels_list, replicates)

1) What is the correct function to run ANOVA with repeatead measures using 3 or more factors? I tryed rm_anova, but it raises an error for more than 2 factors. I'm trying this, but not sure if it is correct:

model1 = pg.anova(dv='y', between=factors, data=df, detailed=True)

2) I saw that there is a function pg.power_anova. What does exactly it measure?

3) What is the correct way to test assumptions of ANOVA for factorial design like this above? Should I test normality of measures (y) for each Factor in my experiment? Or should I test y grouping by all factors in my dataframe? And about variance? I wrote the code below, but not sure if I'm doing it right:

    measures = []
    for name, group in self.df.groupby(self.factors):
        group_measures = group['y'].values          
        k2, p = stats.normaltest(group_measures) 
        print('Normality test for group', name, p >= 0.05)
        print('Variance for group', name, np.var(group_measures, ddof=1)) # type: ignore
        measures.append(group_measures)
    k2, p = stats.levene(*measures) # type: ignore
    print('Variance teste between groups', p, 'p >= 0.05', p >= 0.05)
raphaelvallat commented 1 year ago

Hi,

  1. If this is a fully repeated measure ANOVA, then you can use pg.rm_anova and pass a list of column names in your dataframe to the within parameter. Please see the examples in https://pingouin-stats.org/build/html/generated/pingouin.rm_anova.html or in this notebook: https://github.com/raphaelvallat/pingouin/blob/master/notebooks/01_ANOVA.ipynb. If you have one within-subject (repeated) factor and one between-level factor (e.g. group), then you want to use the mixed_anova function.

  2. I think the documentation of power_anova (https://pingouin-stats.org/build/html/generated/pingouin.power_anova.html#pingouin.power_anova) is pretty explicit and includes several examples, please make sure to read it carefully and check the above example notebook as well

  3. The assumptions will not be the same depending on whether you are using a one-way, one-way repeated measures or mixed design anova. Some of the functions that are relevant are:

Have you considered using a free software such as JAMOVI or JASP? They are pretty intuitive and allow to check the assumptions for simple or complex ANOVA designs.