stephenslab / mashr

An R package for multivariate adaptive shrinkage.
https://stephenslab.github.io/mashr
Other
88 stars 19 forks source link

Inflated false positives caused by using rank 1 prior covariance matrix #91

Open DongyueXie opened 3 years ago

DongyueXie commented 3 years ago

We observe inflated false positives when using rank 1 prior covariance matrix in mash. The reason is due to the break down of lfsr. (The assumption of rank-1 is not explicit specified in the mash model, but the ED algorithm preserves the rank of initializations. The initializations are generated using PCA hence most of them are rank-1 matrices. The TEEM algorithm does not preserve the rank but it truncates the eigenvalues and the estimated covariance matrices could still be low rank)

Currently for V_i = I, we propose to adjust U_hat to be U_hat + s^2 I, where s^2 = 2/sqrt(n_k). An illustration of the problem, and the fix is available here: https://dongyuexie.github.io/maskedmash/mashFIX.html

For general case V_i, a solution will come out soon.

gaow commented 3 years ago

@DongyueXie we confirm that the aforementioned problem contributed to the inflated lfsr in multivariate fine-mapping with mixture prior. In my simulations ED prior has inflated lfsr but when you use canonical prior with uniform weights there is no issue. This is consistent with your observation. @zouyuxin has verified in her simulations the same with ED based prior, but additionally in her case even canonical prior has an inflation. The simple fix you posted above seem to solve case examples in her simulations. We have not assessed it systematically yet. We are wondering about if there is a concrete plan moving forward so it can help with our projects. And in particular, what do you think about fixing the canonical prior too? Current mashr workflow excludes canonical prior from the ED step.

DongyueXie commented 3 years ago

@gaow Hi Gao, I need more information of "even canonical prior has an inflation", maybe an example.

zouyuxin commented 3 years ago

I checked some examples with canonical prior. It turns out not relevant to the ED rank deficient covariance problem.

stephens999 commented 3 years ago

although it was not the explanation in this case, I guess the canonical matrix of all 1s could cause this problem in principle. The other rank1 canonical matrices (singletons) should not have this problem because of their sparsity.

I think short-term solution is for our group, in our analyses, to modify all r1 matrices input to mashr (or mvSusie) by adding a small diagonal term.

In the long term we will want to either add a step to "fatten" (add diagonal term) to r1 matrices, or modify the functions cov_canonical and cov_ed (or maybe cov_ud which will probably replace cov_ed when it is ready) to do that fattening automatically.

stephens999 commented 3 years ago

I guess we could immediately modify cov_ed to add a small diagonal (1/sqrt(nstrong)) to all covariance matrices at the end? I think this would be low-risk and fix many potential problems with r1 matrices.