stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
174 stars 44 forks source link

Updated vignette for `susie_rss` #91

Closed gaow closed 4 years ago

dwightman commented 4 years ago

Hello,

I am running susie_rss using Z scores from a GWAS meta-analysis and an LD matrix as a correlation matrix. I created the LD matrix using the plink command '--r square'. All of the snps used to create the LD matrix and Z scores are overlapping and in the same order. I flip the sign of the Z score when the A1 allele from the meta-analysis is equal to the A2 allele in the plink bim file used to create the ld matrix.

In most (but not all) of the regions that I am testing I get this warning: "Input z does not lie in the space of non-zero eigenvectors of R. The result is thus not reliable. Please refer to https://github.com/stephenslab/susieR/issues/91 for a possible solution."

I was wondering what could be the cause of this warning and how could I remedy it?

As an aside, I have specified r_tol=1e-4 because with the default setting some tests would result in this error "The correlation matrix is not a positive semidefinite matrix." I am not sure if that this is relevant to the Z score issue.

Cheers, Doug

zouyuxin commented 4 years ago

@dwightman Thanks for reaching out. Sorry, we haven't updated the vignette.

The model is based on the assumption that the effect size summary statistics and the LD matrix come from the same data. If they're not, the result could be unreliable.

The error message you got with default r_tol means your LD matrix is not positive semi-definite. I’ve heard some people got non-positive semi-definite LD matrix from plink, so this may be a common issue with plink... Could you try to compute correlation matrix using R or python or LDstore, and see if the resulting LD matrix is positive semi-definite?

dwightman commented 4 years ago

No worries, thanks for your response.

Unfortunately, I am not able to access individual level data for the majority of my samples to create the LD matrixes. So I expect that my results will be unreliable.

I have used to the ld command in the snpStats R package to generate the LD matrix and it is also not positive semi-definite. I also quickly tried using LDstore to generate a matrix (with the --matrix command) from my plink files and the resulting matrix contained negative values.

Cheers, Doug

stephens999 commented 4 years ago

negative values in the LD matrix are OK - it just must not have negative eigenvalues.

What data are you computing the LD matrix from? Does it have any missing genotypes? If there are no missing data the LD matrix should always be positive semi-definite. If you can share the genotypes you are using to compute the LD matrix (eg it is a public reference panel) then we could look further into it. Numerical errors are always a potential issue...

Matthew

On Mon, Mar 9, 2020 at 8:19 AM dwightman notifications@github.com wrote:

No worries, thanks for your response.

Unfortunately, I am not able to access individual level data for the majority of my samples to create the LD matrixes. So I expect that my results will be unreliable.

I have used to the ld command in the snpStats R package to generate the LD matrix and it is also not positive semi-definite. I also quickly tried using LDstore to generate a matrix (with the --matrix command) from my plink files and the resulting matrix contained negative values.

Cheers, Doug

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/91?email_source=notifications&email_token=AANXRRJZKO7WLQBJZMZGYZDRGTUHRA5CNFSM4JVSP4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOHCRUY#issuecomment-596519123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRMRUJNZXKWBWTF24K3RGTUHRANCNFSM4JVSP4BA .

gaow commented 4 years ago

@dwightman an LD matrix can contain negative values for negative correlations -- are you actually referring to negative eigen values?

@zouyuxin has already updated the vignette adding a section of discussion on external reference LD panels: https://stephenslab.github.io/susieR/articles/finemapping_summary_statistics.html#using-ld-from-reference-panel please let us know if this solves your issue.

Would it be possible if you simply try to use corr function in R to compute the LD matrix, if you are using 1000 genomes reference that'd not be too big of a genotype to load to R and compute directly.

dwightman commented 4 years ago

The LD matrix generated from plink and snpStats had negative eigenvalues, they caused the error from susie_rss. I did not test the matrix generated from LDstore though.

I just tried generating a raw file from plink using --recodeA and then using the R function cor() on that genotype data to generate the correlation matrix. This worked well, I no longer needed r_tol=1e-4 and the output specified that "Input z is in space spanned by the non-zero eigenvectors of R". This method worked on all 25 of my genomic risk loci.

I wonder why generating the ld matrix with '--r square' did not work but using the same genotype data with cor() in R worked fine.

Thanks for your help. Doug

stephens999 commented 4 years ago

does the ld matrix generated using --r square look "similar" to the one from cor()? but just maybe fewer decimal places? I'm wondering if it is rounding error or something else. Matthew

On Tue, Mar 10, 2020 at 5:11 AM dwightman notifications@github.com wrote:

The LD matrix generated from plink and snpStats had negative eigenvalues, they caused the error from susie_rss. I did not test the matrix generated from LDstore though.

I just tried generating a raw file from plink using --recodeA and then using the R function cor() on that genotype data to generate the correlation matrix. This worked well, I no longer needed r_tol=1e-4 and the output specified that "Input z is in space spanned by the non-zero eigenvectors of R". This method worked on all 25 of my genomic risk loci.

I wonder why generating the ld matrix with '--r square' did not work but using the same genotype data with cor() in R worked fine.

Thanks for your help. Doug

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/91?email_source=notifications&email_token=AANXRRIRLL2OXNWHNCABK6DRGYG5LA5CNFSM4JVSP4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOKZLKQ#issuecomment-597005738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRKUWAU6KSQJHHT7CNLRGYG5LANCNFSM4JVSP4BA .

dwightman commented 4 years ago

I am not sure of a good way to test that but from just looking at the two matrixes the values look similar except that the output from plink has fewer decimal places and is rounded. The only values that they have in common are when the value is 1, but with each digit I round the more the values become identical.

Doug

stephens999 commented 4 years ago

then best guess is that it seems like a rounding error.

On Tue, Mar 10, 2020 at 8:48 AM dwightman notifications@github.com wrote:

I am not sure of a good way to test that but from just looking at the two matrixes the values look similar except that the output from plink has fewer decimal places and is rounded. The only values that they have in common are when the value is 1, but with each digit I round the more the values become identical.

Doug

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gaow commented 4 years ago

Thanks @dwightman and @stephens999 for the updates / clarifications. I think we can consider this issue resolved.

stephens999 commented 4 years ago

@gaow @zouyuxin i suggest we should have a flag for susie_rss to add the z automatically? eg a z_ld_weight parameter,that specifies what weight should be assigned to the z in the ld matrix - it will use something like wzz' + (1-w) Rwhere R is the user specified LD matrix. The default could be 0.01 or 0....

RL-m commented 3 years ago

@dwightman an LD matrix can contain negative values for negative correlations -- are you actually referring to negative eigen values?

@zouyuxin has already updated the vignette adding a section of discussion on external reference LD panels: https://stephenslab.github.io/susieR/articles/finemapping_summary_statistics.html#using-ld-from-reference-panel please let us know if this solves your issue.

Would it be possible if you simply try to use corr function in R to compute the LD matrix, if you are using 1000 genomes reference that'd not be too big of a genotype to load to R and compute directly.

Hello,

Sorry. But I didn't see the section about using external LD reference panel. Could you please specify it again?

I'm new to this field and another thing I am confused is how did you filter the SNPs from reference panel. I tried to use summary data for analysis, and UK biobank data as reference, but the amount of SNPs is too large, so does 1000 genome reference. I want to know how did you decide the LD window and do you have a more detailed pipeline for fine-mapping with summary statistics.

Many thanks!

zouyuxin commented 3 years ago

The susie_rss function and the vignette are updated recently. The previous section about the reference LD matrix is no longer relevant.

For fine-mapping, we check small regions of the genome (1000s - 10k SNPs might be typical). So you need to break the genome into small regions (based on LD block or define a fixed-length window around the signal). For each region, you need to match the SNPs between summary statistics and the reference panel.