neurodata / hyppo

Python package for multivariate hypothesis testing
https://hyppo.neurodata.io/
Other
215 stars 90 forks source link

Index error in MGCX #409

Open Rayerdyne opened 6 months ago

Rayerdyne commented 6 months ago

My issue is about an IndexError that appears using MGCX.test(). This error is originally thrown by scipy multiscale_graphcorr (cfr stacktrace).

I'm very surprised this depends on the random number generation, i.e. it fails for some seeds but not all of them. Increasing the number of replications (reps) seems to increase the probability that an error occurs. Setting reps=1000 makes seed 16 fail as well. The former actually makes me think I messed up somewhere, but I can't get where

Reproducing code example:

import sys

import pandas as pd
import numpy as np

from hyppo.time_series import MGCX

def test(seed):
    print(f"Testing seed {seed}")
    reps=100

    df = pd.DataFrame([[1, 1],
                       [2, 1],
                       [3, 1],
                       [4, 4],
                       [5, 5],
                       [6, 6]], columns=["a", "b"])

    i_test = MGCX()
    rstate = np.random.RandomState(seed)

    stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
    print(f"stat: {stat}, pval: {pval}, d: {d}")

if __name__ == "__main__":
    if len(sys.argv) > 1:
        seed = int(sys.argv[1])
        test(seed)

    else:
        test(16)
        test(0)

Error message

Testing seed 16
stat: 0.886004262777708, pval: 0.0297029702970297, d: {'opt_lag': 0, 'opt_scale': [6, 4]}
Testing seed 0
Traceback (most recent call last):
  File "/home/f/TRAVAIL/csod/misc/hyppo/problem.py", line 32, in <module>
AIL/csod/misc/hyppo/problem.py", line 22, in test
    stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 194, in test
    stat, pvalue, stat_list = super(MGCX, self).test(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 130, in test
    Parallel(n_jobs=workers)(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 159, in _perm_stat
    perm_stat = calc_stat(distx, permy)[0]
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 106, in statistic
    stat, opt_lag = compute_stat(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/_utils.py", line 93, in compute_stat
    indep_test_stat = indep_test.statistic(x, y)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/independence/mgc.py", line 161, in statistic
    mgc = multiscale_graphcorr(distx, disty, compute_distance=None, reps=0)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6490, in multiscale_graphcorr
    stat, stat_dict = _mgc_stat(x, y)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6541, in _mgc_stat
    stat = stat_mgc_map[m - 1][n - 1]
IndexError: index 5 is out of bounds for axis 0 with size 1

Version information

sampan501 commented 4 months ago

Sorry for the late response, this just got on my radar right now. I'll take a look into this