Multicollinearity - Githubissues

rob-luke commented 2 years ago

I recently saw an insightful presentation from @helenacockx which described how multicollinearity/collinearity (wikipedia) can affect fNIRS GLM analysis. MNE-NIRS provides support for both averaging and GLM analysis. The issue of collinearity in fNIRS GLM analysis has been discussed in the literature, "a challenge of these regressor models is collinearity introduced between the task and nuisance regressors, which can happen if the systemic physiological response is correlated with the performance of the task. Collinearity in the regression analysis can destabilize it due to poor mathematical conditioning of the model and can produce unpredictable results." (Santosa et al 2020). Specifically, this is of concern as short channels (designed to measure systemic activity, not neural activity) included in the design matrix to remove systemic contributions to the signal can be highly correlated with the task regressor.

Relevant reading:

https://dartbrains.org/content/GLM_Single_Subject_Model.html#multicollinearity
Santosa 2020 used PCA to remove colinearity between the nuisance regressors, but this did not solve the issue of colinearity between short channels and the task regressor. "This was only used to remove collinearity from using multiple short-seperation channels in the model, but did not reduce any collinearity between the task regressors and the short-seperation nuisance regressors."
Need to find more articles on the topic

Tasks

[ ] Add metrics to quantify colinearity in the design matrix
[ ] Implement methods to mitigate the problem of colinearity

References

Hendrik Santosa, Xuetong Zhai, Frank Fishburn, Patrick J. Sparto, Theodore J. Huppert, "Quantitative comparison of correction techniques for removing systemic physiological signal in functional near-infrared spectroscopy studies," Neurophoton. 7(3) 035009 (23 September 2020)

helenacockx commented 2 years ago

Thanks for picking this up Rob! The article of Santosa is indeed the only fNIRS-related paper that I found on this topic. In this paper, they propose to solve the issue of collinearity by performing a regularized mixed-effects model estimation. However, they concluded that the mixed-effect model only showed a slight improvement in performance (type-I errors) compared to the AR-IRLS model, this came at a cost of a 10-fold computation time.

I found that this article also gave insight into the concept of multicollinearity and why not solve it with orthogonalization: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412813/

@robertoostenveld asked two of our colleagues of the Donders Institute with more fMRI-expertise how to deal with this issue. They described a two-step approach (so first regressing out the short channels and then performing the GLM without nuisance regressors) as a conservative but robust method (so removing type I errors at the cost of possible type 2 errors). "The alternative is to better understand the noise, the colinearity, try to avoid it by changing the task design (not possible here anymore) and apply a more optimized but also more liberal (i.e. more false alarms) joint fit. One of the colleagues also came up with the solution to use model comparison, which goes in the direction of Bayesian stats."

helenacockx commented 2 years ago

I also have been thinking about a metric to quantify collinearity. The Variance Inflation Factor (VIF) is often used in fMRI studies and describes how much the variance of one of the estimated regression coefficients is inflated by the existence of correlation among the predictor variables in the model (https://online.stat.psu.edu/stat501/lesson/12/12.4). If I understand it correctly, we are mostly interested in how much the task regressor can be explained by the short channels (and less how much the short channels are correlated to each other because we are not interested in the betas of the nuisance regressors), so it might be enough to only calculate the VIF of the task regressor. However, this is calculated for each model/subject, so it is not clear to me how to deal with this VIF on the group level. Furthermore, when using this VIF, we should think carefully about the threshold: https://link.springer.com/article/10.1007%2Fs11135-006-9018-6

mne-tools / mne-nirs

Multicollinearity #413

Tasks

References