stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
176 stars 46 forks source link

Residual Variance Estimate Failed - Negative Value #162

Open tianwen0003 opened 2 years ago

tianwen0003 commented 2 years ago

Hi

I just got the same error(2 out of 3000 runs) as #90 when did fine-mapping by susie_rss function.

*Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) : Estimating residual variance failed: the estimated value is negative Calls: susie_rss -> susie_suff_stat Execution halted**

when set estimate_residual_variance = FALSE, the function run well but all pip are zero

Here, I provide the all files for you to test ENSG00000149084.7.vcf.gz

ENSG00000149084.7_fastqtl.zip In this file, the second column is the "slope" from fastqtl, the thrid column is the "slope_se" from fastqtl

ENSG00000149084.7_ldMatrix.zip This is the ld matrix calculated from plink --r

the sample number n = 204

var_y = 0.960491

stephens999 commented 2 years ago

we recommend to use estimate_residual_variance = FALSE with susie_rss

If the PIP=0 with this setting then this suggests there are no strong signals in the region. Do you have strong signals in the region?

tianwen0003 commented 2 years ago

sorry for my late response.

  1. Actually, the signals in the regions are not weak, instead, the signals are very strong, the nominal p-values are range from 8.134e-05 to 3.09152e-50.
  2. Do you mean to use estimate_residual_variance = FALSE for all runs or the runs reported error?
  3. Could you please provide a input-preparing guide file for fine-mapping from raw genotype and phenotype and covariable file, the link in the https://stephenslab.github.io/susie-paper/ can not be opened
pcarbo commented 2 years ago

@tianwen0003 Which link cannot be opened exactly?

tianwen0003 commented 2 years ago

Sorry, the link is image in this https://stephenslab.github.io/susie-paper/finemapping_benchmark.html

pcarbo commented 2 years ago

Thanks. It should be this: https://stephenslab.github.io/susie-paper/manuscript_results/GTEx_Association_Preprocessing.html

stephens999 commented 2 years ago

Actually, the signals in the regions are not weak, instead, the signals are very strong, the nominal p-values are range from 8.134e-05 to 3.09152e-50.

then PIP should not be 0 with estimate_residual_var = FALSE. Something is wrong. Can you try with L=1?

Do you mean to use estimate_residual_variance = FALSE for all runs or the runs reported error?

When using summary data (susie_rss) the recommendation is always FALSE. When using individual level data (susie) the recommendation is TRUE (And these are the software defaults).

tianwen0003 commented 2 years ago

then PIP should not be 0 with estimate_residual_var = FALSE. Something is wrong. Can you try with L=1?

Sorry, I checked the results, L=10 and L=1 did get non-zero PIP and 95% credible set.

When using summary data (susie_rss) the recommendation is always FALSE.

I found in the guide from https://stephenslab.github.io/susieR/articles/finemapping_summary_statistics.html, the suggestion is set estimate_residual_var = TRUE when use in-sample LD matrix, which is the data type I have. so, in my pipeline, I should set it to TRUE, right?

stephens999 commented 2 years ago

Hi

I just got the same error(2 out of 3000 runs) as #90 when did fine-mapping by susie_rss function.

*Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) : Estimating residual variance failed: the estimated value is negative Calls: susie_rss -> susie_suff_stat Execution halted**

when set estimate_residual_variance = FALSE, the function run well but all pip are zero

Here, I provide the all files for you to test ENSG00000149084.7.vcf.gz

ENSG00000149084.7_fastqtl.zip In this file, the second column is the "slope" from fastqtl, the thrid column is the "slope_se" from fastqtl

ENSG00000149084.7_ldMatrix.zip This is the ld matrix calculated from plink --r

the sample number n = 204

var_y = 0.960491

can you provide the code you run to apply susie to these input files so we can reproduce the problem?

tianwen0003 commented 2 years ago

can you provide the code you run to apply susie to these input files so we can reproduce the problem?

Sure, the code and input files are follows: run_susieR.zip ENSG00000149084.7.data.zip ENSG00000149084.7.snpList.zip ENSG00000149084.7.ld.zip

gaow commented 2 years ago

I'm in the middle of something else and have to run now ... but here is what I have so far:

library(stringr)
library(susieR)
set.seed(1)
gene <- c("ENSG00000149084.7")

print(gene)
qtlResult <- read.table(file = str_c(gene,".data"),sep = "\t",header = F)
ldMatrix <- as.matrix(read.table(file = str_c(gene,".ld"),sep = " ",header = F))
ldMatrix <- ldMatrix[1:dim(ldMatrix)[1],1:dim(ldMatrix)[1]]
expression_var <- 0.960491
sampleSize <- 204
fitted_rss1 <- susie_rss(bhat = qtlResult$V2,shat = qtlResult$V3,
                         n = sampleSize,R = ldMatrix,
                         var_y = expression_var,
                         L = 10,
                         estimate_residual_variance = TRUE)

I get

WARNING: XtX is not symmetric; forcing XtX to be symmetric by replacing XtX with (XtX + t(XtX))/2

and negative estimate for residual variance. Then I tried:

isSymmetric(ldMatrix) # FALSE
krig_res = kriging_rss(qtlResult$V2/qtlResult$V3, ldMatrix, sampleSize)
print(krig_res$plot)

I get

WARNING: The matrix R is not positive semidefinite. Negative eigenvalues are set to zero.

image

Certainly something is wrong with the LD matrix that is supposed to match the genotype; however there is no large difference between observed and expected z-scores. I dont think we have your genotype data to look further into this discrepency.

pcarbo commented 2 years ago

@gaow @zouyuxin Would you recommend that they try using the methods suggested in the susie-rss bioRxiv paper to adjust their LD matrix?

stephens999 commented 2 years ago

The report was that there were problems (PIP=0) even with L=1, for which LD should be irrelevant. So can we check L=1?

zouyuxin commented 2 years ago

There is one CS using L=1, the estimated residual variance is 0.291. The CS contains 24 variants (perfectly correlated), each with PIP 0.04. With L=3, it finds 2 CSs, and the estimated residual variance is 0.242. The correlation between the top variant in each CS is 0.72. The variants in CS1 have z scores -21.55285, and the variants in CS2 have z scores around -8. I get the negative estimated residual variance error as I increase L = 4.

zouyuxin commented 2 years ago

With estimated residual variance = FALSE, we identify 1 CS with L = 1 - 10.

@pcarbo the estimated regularization parameter lambda is 0.0027, the adjusted one also gives an error at L = 4 (estimate_residual_variance = TRUE). The smallest eigenvalue for the LD matrix is -1.6. The LD matrix is from plink, and it's a known issue that there are rounding and numerical errors.

@pcarbo @gaow @tianwen0003 Do you know whether the covariate effects are removed from the genotypes in fastqtl?

gaow commented 2 years ago

@zouyuxin good point about the covariates. The genotype data does not remove the covariates from it -- that'd be one reason the kriging plot is not perfect because the z score and LD can be a bit off because of the covarites is not removed from genotypes. In any case, the data used in susie_rss here is not exactly the "in sample LD" by our definition.

stephens999 commented 2 years ago

So with estimate residual variance =False is everything ok?

On Fri, Jun 10, 2022, 14:11 gaow @.***> wrote:

@zouyuxin https://github.com/zouyuxin good point about the covariates. The genotype data does not remove the covariates from it -- that'd be one reason the kriging plot is not perfect because the z score and LD can be a bit off because of the covarites is not removed from genotypes. In any case, the data used in susie_rss here is not exactly the "in sample LD" by our definition.

— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/162#issuecomment-1152656312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRP5ZBK3NUTD7Q2HGGLVOOHNTANCNFSM5W4CSZDA . You are receiving this because you commented.Message ID: @.***>

stephens999 commented 2 years ago

The original report was that the Pips were all 0 in that case. Can we reproduce that?

On Fri, Jun 10, 2022, 14:24 Matthew Stephens @.***> wrote:

So with estimate residual variance =False is everything ok?

On Fri, Jun 10, 2022, 14:11 gaow @.***> wrote:

@zouyuxin https://github.com/zouyuxin good point about the covariates. The genotype data does not remove the covariates from it -- that'd be one reason the kriging plot is not perfect because the z score and LD can be a bit off because of the covarites is not removed from genotypes. In any case, the data used in susie_rss here is not exactly the "in sample LD" by our definition.

— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/162#issuecomment-1152656312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRP5ZBK3NUTD7Q2HGGLVOOHNTANCNFSM5W4CSZDA . You are receiving this because you commented.Message ID: @.***>

gaow commented 2 years ago

The original report was that the Pips were all 0 in that case.

it seems a false alarm according to https://github.com/stephenslab/susieR/issues/162#issuecomment-1144322104

zouyuxin commented 2 years ago

So with estimate residual variance =False is everything ok?

yes, there is no error or pip all =0 when estimate residual variance = FALSE.

stephens999 commented 2 years ago

Ok, good then it seems all is well. Maybe we should warn against using estimate residual variance =True with summary data, even with in sample ld? Or just not suggest it.... It seems difficult to be precise about the circumstances when it is ok to use?

tianwen0003 commented 2 years ago

@pcarbo @gaow @tianwen0003 Do you know whether the covariate effects are removed from the genotypes in fastqtl?

We used the PC1 and PC2 of genotype as covariants for fastqtl