jennyrieck / C-MARINeR

Connectivity - Multivariate Analyses and Resampling Inference for Neuroimaging in R
3 stars 2 forks source link

Positive semi-definite requirement #15

Closed derekbeaton closed 5 years ago

derekbeaton commented 5 years ago

A very serious issue for us to resolve. covSTATIS (and PCA) require covariance matrices that are square, symmetric, and positive semi definite (PSD). The square/symmetric is easy and an enforced requirement.

PSD, on the other hand, is not always true for some "connectivity" data. These matrices do not adhere to the definition of a covariance matrix, but do contain correlation/covariance values. These typically result from selective use of items to correlate, e.g., skipping over certain pairs but different pairs, for each of the pairwise cor/cov.

This poses a theoretical/computational/programmatic problem. There are three options:

derekbeaton commented 5 years ago

The final point also brings up two possibilities: apply this to each matrix, or only the compromise? I suppose that, so long as the compromise matrix is PSD, this is good enough. However, the MFA norm poses a problem at the per-matrix level.

derekbeaton commented 5 years ago

These issues can also result from having more ROIs/voxels than time points.

derekbeaton commented 5 years ago

I think this could be handled in one of two ways for now: either in the GSVD package, I rewrite geigen() or tolerance.eigen() to use the SVD for stability, or, I set a much higher threshold like in nearPD (1e-6)

derekbeaton commented 5 years ago

I think I'll take the approach of using the svd() but making sure that the matrix is strictly square and symmetric, with a check for singular/eigenvalues with high negative magnitude and/or any complex values (but these shouldn't occur).

derekbeaton commented 5 years ago

I don't know anymore. Will re-visit this periodically to make a suitable decision. For now, there is a switch to just let offending matrices pass through

derekbeaton commented 5 years ago

After much hemming and hawing, and excessive indecision, I finally fixed this. It required two fixes. One set of fixes was in the GSVD package, where I had to better/correctly handle the tolerance checks of eigen/singular values. The second set was to use tolerance via covstatis() to pass through to GSVD while also allowing us to strictly or not enforce the positive semi-definite requirement.

Phew.