murraylab / brainsmash

Brain Surrogate Maps with Autocorrelated Spatial Heterogeneity
GNU General Public License v3.0
41 stars 11 forks source link

Suggestions for handling bilateral surrogates #12

Open rmarkello opened 4 years ago

rmarkello commented 4 years ago

Heyo!

In most of my analyses I'm interested in generating whole-brain surrogates, but for obvious reasons we have to generate distance matrices separately for each hemisphere so the surrogates need to be generated independently. This often results in surrogates with values that are dramatically mismatched across hemispheres (e.g., one hemisphere has a mean surrogate value of ~5 and the other ~30). This makes recombining them to then e.g., correlate with a different map problematic.

I can pass resample=True to ensure the data are resampled to the original map, but in some instances this does (as the docs suggest) noticeably decrease the fit between the surrogate and empirical variograms. I was wondering if there were other options / suggestions you'd have that might be able to better yoke the values in the two hemispheres together somehow? I can imagine e.g., z-scoring the surrogates separately for each hemisphere before recombining but I imagine this, too, would decrease variogram fit...

Thanks in advance for any help!

jbburt commented 4 years ago

Hey Ross,

The variogram is actually insensitive to mean shifts so you should just be able to de-mean both unilateral maps prior to combining them. I hadn't given this a lot of thought but it makes more sense to do this automatically, so I've just pushed an update to the code that does this. The variogram is also insensitive to sign flips too though... I'm really not sure how to handle that (or if it needs handling?).

One other thing you might try is building a 2n x 2n distance matrix which has a block diagonal structure such that the upper left quadrant is the left hemisphere distance matrix, the lower right quadrant is the right hemisphere distance matrix, and the two off-diagonal quadrants are infinity (not sure how the code will handle this but you can try it, maybe make them NaNs instead?). This procedure would in theory result in a smoothing parameter optimization step that is informed by both hemispheres at once, rather than one in isolation.

rmarkello commented 4 years ago

Awesome! Thanks so much for the quick response. I guess I could've given this a few more seconds of thought before creating this issue, but I really appreciate the swift code update 🙌

The 2n x 2n distance matrix doesn't error out, but it doesn't quite work—all the deltas >=0.5 (assuming a symmetric parcellation) yield nonsense maps. That is, when the delta is such that some of the inf values in the distance matrix are included in the kernel, the sm_xperm result in theBase.__call__() function (L129) is a masked array where all the values are masked. This means that the final regression step will return identical values for all deltas >= 0.5 (with betas = 0).

One potentially straightforward solution would be to limit the deltas to be scaled for each hemisphere (i.e., if I provide bilateral data my delta of 0.9 should only take the 90% KNN for a single hemisphere, not both hemispheres), but it's a Monday so I can't think of a particularly nice way for users to specify their distance matrix is bilateral (beyond a "dumb" check for whether there are inf values in it which seems liable to breaking).

Let me know if you have any thoughts, and thanks again!

llevitis commented 3 years ago

Hi @rmarkello and @jbburt - I came across this issue as I was also trying to use BrainSMASH as part of a whole brain analysis. I tried the suggested approach of creating a 2N x 2N matrix and setting the two off-diagonal quadrants to NaN, but I get an error about the NaN values when I call the

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' when I call Base. If either of you have any suggestion for another value to pass instead of NaN

I would appreciate any input about what I might need to do differently in constructing the distance matrix or if there have been any relevant updates made to the toolbox since @rmarkello first made this issue. Thank you in advance for your help!

jbburt commented 3 years ago

Hi Liza,

Notice that there is only a slight distinction between 1) computing surrogate maps for each hemisphere independently and then combining them to form a bilateral surrogate map, and 2) using a 2N x 2N distance matrix with NaNs/INFs for off-diagonal matrix elements. The only difference is that the latter fits the variogram (and thus, the spatial autocorrelation) for the two hemispheres in aggregate, rather than separately. So regardless of which of these two methods you were to use, the two hemispheres for each bilateral surrogate map are effectively uncorrelated.

What I recommend you do is generate a set of surrogate maps for one hemisphere, then mirror these unilateral surrogate maps contralaterally to get the other hemisphere. This is for a few reasons: 1) biological brain maps often exhibit a high degree of bilateral symmetry. This is something you can test empirically using your target brain map. 2) Because of 1), mirroring contralaterally is probably going to yield a more conservative statistical estimate. In other words, it should be more difficult to get a significant result when mirroring the surrogate maps rather than generating them independently (or using the 2N x 2N approach). 3) If you generate each hemisphere independently, you're introducing the assumption into your null hypothesis that the two hemispheres are completely uncorrelated. Perhaps this makes sense for your application but in general I'm not sure whether this is a useful null hypothesis to reject.

As a side note, I think the most theoretically principled approach to generating bilateral surrogate maps would be to compute the full 2N x 2N distance matrix while tracing streamlines through inter-hemispheric corpus-callosal tracts (in other words, actually computing the values in the two off-diagonal quadrants). But in all likelihood this would be a significant methodological lift so I'm not sure it's a path that you want to go down.

llevitis commented 3 years ago

Hi @jbburt - thank you so much for the thorough explanation! That makes a lot of sense about mirroring contralaterally being a safer or more conservative way of determining significance. Regarding the suggestion to trace streamlines through inter-hemispheric corpus-callosal tracts, that certainly sounds like the most optimal approach and one that I will take a stab at doing. In the meantime, I'll generate preliminary significance values using the simpler approach first.