Open DongyueXie opened 3 years ago
@DongyueXie we confirm that the aforementioned problem contributed to the inflated lfsr in multivariate fine-mapping with mixture prior. In my simulations ED prior has inflated lfsr but when you use canonical prior with uniform weights there is no issue. This is consistent with your observation. @zouyuxin has verified in her simulations the same with ED based prior, but additionally in her case even canonical prior has an inflation. The simple fix you posted above seem to solve case examples in her simulations. We have not assessed it systematically yet. We are wondering about if there is a concrete plan moving forward so it can help with our projects. And in particular, what do you think about fixing the canonical prior too? Current mashr workflow excludes canonical prior from the ED step.
@gaow Hi Gao, I need more information of "even canonical prior has an inflation", maybe an example.
I checked some examples with canonical prior. It turns out not relevant to the ED rank deficient covariance problem.
although it was not the explanation in this case, I guess the canonical matrix of all 1s could cause this problem in principle. The other rank1 canonical matrices (singletons) should not have this problem because of their sparsity.
I think short-term solution is for our group, in our analyses, to modify all r1 matrices input to mashr (or mvSusie) by adding a small diagonal term.
In the long term we will want to either add a step to "fatten" (add diagonal term) to r1 matrices, or modify the functions cov_canonical
and cov_ed
(or maybe cov_ud
which will probably replace cov_ed
when it is ready) to do that fattening automatically.
I guess we could immediately modify cov_ed
to add a small diagonal (1/sqrt(nstrong)) to all covariance matrices at the end?
I think this would be low-risk and fix many potential problems with r1 matrices.
We observe inflated false positives when using rank 1 prior covariance matrix in mash. The reason is due to the break down of lfsr. (The assumption of rank-1 is not explicit specified in the mash model, but the ED algorithm preserves the rank of initializations. The initializations are generated using PCA hence most of them are rank-1 matrices. The TEEM algorithm does not preserve the rank but it truncates the eigenvalues and the estimated covariance matrices could still be low rank)
Currently for V_i = I, we propose to adjust U_hat to be U_hat + s^2 I, where s^2 = 2/sqrt(n_k). An illustration of the problem, and the fix is available here: https://dongyuexie.github.io/maskedmash/mashFIX.html
For general case V_i, a solution will come out soon.