pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
237 stars 76 forks source link

Fix bug in _perform_fdr_correction #109

Closed daehrlich closed 5 months ago

daehrlich commented 5 months ago

The previous implementation of stat._perform_fdr_correction exhibits significant differences to the results of statsmodules.fdr_correction. Upon closer inspection of the code i’ve identified the problem: It’s the lines 309ff in stats.py:

if np.invert(sign).any(): first_false = np.where(np.invert(sign))[0][0] sign[first_false:] = False # avoids false positives due to equal pvals

Here, IDTxl searches the index of the first non-significant pval in the sorted array of pvals and sets everything afterwards to non-significant, e.g., 110010 -> 110000 for an ordered array of significance values where 1 is significant and 0 is not. This is at odds with how the Benjamini-Hochburg and Benjamini-Yakutelli correction procedures are designed (see https://en.wikipedia.org/wiki/False_discovery_rate): Instead, one should find the last significant pval and set every test before that to significant, i.e.

if sign.any(): signmax = max(np.nonzero(sign)[0]) sign[:signmax] = True

where this code is taken from statsmodules (slightly renamed). For the same example as above this yields 110010 -> 111110.

This branch solves this issue by instead using the correct implementation of statsmodels directly.

pwollstadt commented 5 months ago

Merged into develop