Incorrect statistics and p-values for MANOVA

snarles commented 2 years ago

My issue is about the output of hyppo.ksample.MANOVA. It is known that in the 2-sample case, MANOVA is equivalent to Hotelling's T2 test. This can be verified by comparing hyppo.ksample.Hotelling and statsmodels.multivariate.manova. However, hyppo.ksample.MANOVA differs from both hyppo.ksample.Hotelling and statsmodels.multivariate.manova.

Reproducing code example:

import numpy as np
from numpy.random import randn
from statsmodels.multivariate import manova
from hyppo.ksample import MANOVA, Hotelling
rng = np.random.default_rng(2022)
n = 1000
p = 20
x1 = rng.standard_normal((n, p))
y1 = np.zeros((n, 1))
x2 = rng.standard_normal((n, p))
y2 = np.ones((n, 1))
x = np.vstack([x1, x2])
y = np.vstack([y1, y2])
y = y - np.mean(y)
stat, pvalue = MANOVA().test(x1, x2)
print(stat, pvalue)
# (0.0009554388955993614, 0.9999999301031335)
stat, pvalue = Hotelling().test(x1, x2)
print(stat, pvalue)
# (1.794917015985669, 0.016507943732996515)
print(manova.MANOVA(x, y).mv_test())
#                   Multivariate linear model
#===============================================================
#                                                               
#---------------------------------------------------------------
#           x0           Value   Num DF   Den DF  F Value Pr > F
#---------------------------------------------------------------
#          Wilks' lambda 0.9822 20.0000 1980.0000  1.7952 0.0165
#         Pillai's trace 0.0178 20.0000 1980.0000  1.7952 0.0165
# Hotelling-Lawley trace 0.0181 20.0000 1980.0000  1.7952 0.0165
#    Roy's greatest root 0.0181 20.0000 1980.0000  1.7952 0.0165
#===============================================================

Version information

OS: macOS 10.15.7
Python Version 3.8.5
Package Version 0.3.2

sampan501 commented 2 years ago

Thank you for your issue, will investigate and see if I can find the bug in the code.

sampan501 commented 2 years ago

I found the bug, seems to be an array broadcasting issue. Should work now. I'll patch it and send in an updated version soon

neurodata / hyppo

Incorrect statistics and p-values for MANOVA #323

Reproducing code example:

Version information