stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
174 stars 44 forks source link

susie_rss gives different results using different R #132

Closed Zepeng-Mu closed 2 years ago

Zepeng-Mu commented 3 years ago

Hello, I've been trying to run susie_rss using different R matrices. The first version I used is the correlation matrix of genotype count from a reference panel. Part of which looks like:

            [,1]        [,2]        [,3]        [,4]        [,5]
[1,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[2,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[3,] -0.09664246 -0.09664246  1.00000000 -0.09664246 -0.09664246
[4,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[5,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000

In this way I did not need to scale or center the count matrix, because cor gives the same output. The result makes much sense to me: Screen Shot 2021-06-30 at 13 37 20

I also saw in package documentation that the R option can be:

A p by p symmetric, positive semidefinite matrix.
It can be X'X, the covariance matrix X'X/(n-1), or a correlation matrix.
It should be estimated from the same samples used to compute bhat and shat.
Using an out-of-sample matrix may produce unreliable results.

So the second matrix I used is t(geno) %*% geno, where geno is the nxp matrix of counts that is centered and scaled. The matrix looks like this:

           1:37312926 1:37312963 1:37313050 1:37313060 1:37313086
1:37312926  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37312963  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37313050  -309.3525  -309.3525  3201.0000  -309.3525  -309.3525
1:37313060  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37313086  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000

This matrix only differs from the first one by a factor of 3201, as I have 3202 individuals in the 1KG reference panel. So I expected the result from susie_rss would be the same as before. However, using this matrix I got no CS and PIP is flat. Screen Shot 2021-06-30 at 13 36 58

I'm wondering what can lead to such huge difference in PIP when the R is very similar. Should I just use the correlation matrix that is directly calculated from genotype matrix?

Thanks so much!

zouyuxin commented 3 years ago

In susie_rss documentation, R is a p by p correlation matrix, so you should use the correlation matrix. If you multiply R by 3201, the z scores should multiply by sqrt(3201) to get the same result.

stephens999 commented 3 years ago

@zeping where is this documentation that you quoted (which seems to be out of date)

A p by p symmetric, positive semidefinite matrix. It can be X'X, the covariance matrix X'X/(n-1), or a correlation matrix. It should be estimated from the same samples used to compute bhat and shat. Using an out-of-sample matrix may produce unreliable results.

On Wed, Jun 30, 2021 at 2:25 PM Yuxin Zou @.***> wrote:

In susie_rss documentation, R is a p by p correlation matrix, so you should use the correlation matrix. If you multiply R by 3201, the z scores should multiply by sqrt(3201) to get the same result.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/132#issuecomment-871667759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRROWESZLMQOPW4W33JDTVNVRNANCNFSM47S5WWXQ .

pcarbo commented 3 years ago

@stephens999 It is in susie_ss.R.

Zepeng-Mu commented 3 years ago

Hi, I'm using version "0.11.43" of susieR and found that documentation in ?susie_suff_stat

zouyuxin commented 3 years ago

The documentation you quoted is for susie_suff_stat, which is different from susie_rss. If you are using susie_rss, please check ?susie_rss.

stephens999 commented 3 years ago

@zouyuxin so in susie_suff_stat I find the documentation of the R matrix confusing. In the details it says R = (1/(n-1))X'X But in the documentation for R it says it can also be X'X. why is that?

zouyuxin commented 3 years ago

In susie_suff_stat, the R is converted to a correlation matrix internally. I agree it is clearer to require R to be a correlation matrix in the documentation. I will update the documentation.