Fix issues in statistical testing

daehrlich commented 10 months ago

This pull request addresses two issues with the current implementation of the statistical testing:

Wrong unsorting In both _perform_fdr_correction and max_statistic_sequential in stats.py, the p-values are first sorted and their argsort is saved. After the statistical testing, the resulting significance-arrays are supposed to be unsorted, i.e. brought back into the original order, in order to filter out the non-significant targets or links.

However, this unsorting is not correctly implemented, as the p-value argsort from before is used in a wrong way. The implementation reads

significance_array = significance_array[argsort_idxs],

where it should instead read

significance_array[argsort_idxs] = significance_array.copy()

where the copy is necessary to avoid concurrent modification.

Incorrect number of total significance tests in fdr test To get the correct p-value thresholds for the fdr correction, the formula requires the total number of tests that are performed. In _perform_fdr_correction, this number n is computed from the number of p-values that are submitted. However, this is not equal to the total number of comparisons in this multiple comparison testing problem, since in the function network_fdr, the targets are first filtered by their omnibus significance. I believe this is wrong, and the filtering step should be eliminated so that the n in the fdr correction does in fact coincide with the total number of targets (if correct_by_target==True) or links (if correct_by_target==False).

I have implemented fixes to both issues in the branch dev_fdr_fix. Since these fixes affect the results that idtxl produces, I strongly recommend a thorough code review of the changes before a merge is performed.

mwibral commented 10 months ago

I agree with the detected bugs. Sorting should be corrected.

For the case of correct_by_target==True the number of comparisons should indeed be the number of targets (as per Novelli et al., Network Neuroscience, 2019).

For the case correct_by_target==False one should correct by: number_of_candidates = number_of_unique_combinations_of_timelags_and_source_processes.

This is because at the moment IDTxl only provides p-values per candidate, not per link as fas as I know. (But this should be double checked! Although I doubt it, there maybe per link mTE and p-values prewsent in the returned results -- if so, correction of the per-link p-values by the number of links in the network would be preferable).

Note: an efficient test at the level of candidates (sources x lags) would require to test all candidates across all targets simultaneously and enter them into the maximum statistics for inclusion. This non-hierachical testing is not supported in IDTxl at the moment and would require a major rework of the full code-path.

pwollstadt commented 10 months ago

Merged into develop

pwollstadt / IDTxl

Fix issues in statistical testing #107