[BUG] Why do burt2020, burt2018 and moran not return values from the original input array?

JohannesWiesner commented 2 weeks ago

Issue summary

I want to create null models for my parcellated volumetric statistical image (which includes cortical and subcortical regions). I have tried to use all three available functions (burt2020, burt2018, moran). My intuitive understanding when creating null models is that the original values of the brain regions are “shuffled” across the brain, preserving spatial autocorrelation. Therefore, I assumed that the columns in the output matrix would be the same values as my input values, just in a different order. But this does not seem to be the case? Is this behavior normal or is it a bug?

Detailed issue description

Here's the input data inputs.zip

Steps to reproduce issue

Here's some code to reproduce the problem:

from neuromaps.nulls import burt2020,burt2018,moran
import numpy as np
from nilearn.image import load_img
from nilearn.plotting import plot_stat_map,plot_roi
from joblib import Memory
import time

# load the values from our statistical image. This is a vector with T-values
# for our 376 regions (360 cortical, 16 subcortical). Note, this image is
# also available as nifti-file (t_stat_img.nii) but burt2020, etc want to have 
# an array as soon as you provide an atlas image
t_stats = np.load('t_stats.npy')

# load the corresponding atlas image. This image has 376 unique values (0
# represents background). Corresponding contral-lateral parcels (e.g. left V1,
# right V2, have different values)
atlas_img = load_img('glasser_tian.nii.gz')

# we can plot our atlas and our statistical image
t_stats_img = load_img('t_stats_img.nii')
plot_stat_map(t_stats_img,cut_coords=(0,0,0),draw_cross=False)
plot_roi(atlas_img,cut_coords=(0,0,0),draw_cross=False)

# and we can check that resolution is right (1mm)
print(t_stats_img.header.get_zooms())
print(atlas_img.header.get_zooms())

# set cache object
memory = Memory('.',verbose=0)

# we cache the function so it has to run only once
@memory.cache
def get_null_models(data,parcellation,method,atlas='mni152',density='1mm',n_perm=10,n_proc=20,seed=42):

    if method == 'burt2020':
        nulls = burt2020(data=data,atlas=atlas,density=density,parcellation=parcellation,n_perm=n_perm,n_proc=n_proc,seed=seed)
    elif method == 'burt2018':
        nulls = burt2018(data=data,atlas=atlas,density=density,parcellation=parcellation,n_perm=n_perm,n_proc=n_proc,seed=seed)
    elif method == 'moran':
        nulls = moran(data=data,atlas=atlas,density=density,parcellation=parcellation,n_perm=n_perm,n_proc=n_proc,seed=seed)

    return nulls

start = time.time()
nulls_burt2020 = get_null_models(data=t_stats,parcellation=atlas_img,method='burt2020')
duration_burt2020 = (time.time() - start) / 3600
print(f'Burt2020 took {duration_burt2020} hrs')

start = time.time()
nulls_burt2018 = get_null_models(data=t_stats,parcellation=atlas_img,method='burt2018') 
duration_burt2018 = (time.time() - start) / 3600
print(f'Burt2018 took {duration_burt2018} hrs')

start = time.time()
nulls_moran = get_null_models(data=t_stats,parcellation=atlas_img,method='moran')
duration_moran = (time.time() - start) / 3600
print(f'Moran took {duration_moran} hrs')

# sanity check: Do nulls models contain same values as t_stats just in different order?
t_stats_sorted = np.sort(t_stats)
nulls_burt2020_sorted = np.sort(nulls_burt2020,axis=0)
nulls_burt2018_sorted = np.sort(nulls_burt2018,axis=0)
nulls_moran_sorted = np.sort(nulls_moran,axis=0)

# Check if each sorted column in the null model array equals the sorted input values
for matrix in [nulls_burt2020_sorted,nulls_burt2018_sorted,nulls_moran_sorted]:
    print(np.all(np.all(matrix == t_stats_sorted[:,None], axis=0)))

Software version

3.9.20 | packaged by conda-forge | (main, Sep 30 2024, 17:49:10) [GCC 13.3.0] 0.0.5+27.ga89b699

Code of Conduct

[x] I agree to follow the neuromaps Code of Conduct

JohannesWiesner commented 2 weeks ago

justinehansen commented 2 weeks ago

Hi Johannes, thanks for your question.

The spatial null models currently implemented in neuromaps fall into two categories: spatial permutation nulls ("spin test") and parameterized null models. The spatial permutation nulls involve projecting cortical data to a sphere and rotating this sphere. This is the version where the values of the original vector can be perfectly preserved ("perfect permutations"), although there are also "imperfect" versions of this null that can result in duplicated or missing values from the original array. (Note that by "imperfect" I don't mean that these nulls are inferior.)

However, spin tests can only be used on cortical surfaces. In your case, since you're generating a spatial null for a volumetric image that includes noncortical regions, you are limited to the parameterized nulls (e.g. burt2018, burt2020, moran). These nulls use different methods but ultimately are all "generating" new brain maps that approximate the spatial autocorrelation of the original brain map (e.g. burt2020 is optimizing the variogram fit between null and empirical; moran is using the eigenvectors of the distance matrix to generate null data with identical Moran's I). So all this to say, no you should not expect the same values in the null map for these methods.

Some useful resources for learning more about spatial nulls: Markello & Misic 2021 NeuroImage, and here is a recorded educational workshop on spatial nulls from OHBM.

All the best, Justine

JohannesWiesner commented 2 weeks ago

@justinehansen : Thanks so much for the answer, that explains a lot! Alright, then I know that nothing "wrong happens".

It still leaves me with this issue (because you can not use the same threshold as in the original image) but this is another issue. Maybe interesting though for @jbburt because of this comment:

what if you simply generated surrogate maps prior to binarization? ie, permute the original map (using brainsmash) with the "resample" parameter set to True (ie, perform an SA-invariant permutation), then (using the same threshold as before) binarize your surrogate map to get your ROI mask, then use these surrogate ROI masks to generate samples from your null distribution?

netneurolab / neuromaps