stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
176 stars 46 forks source link

missing value where TRUE/FALSE needed error with susie_rss #206

Open maguileraf opened 10 months ago

maguileraf commented 10 months ago

Hi, I am trying to run susie_rss and for some regions it works. However for other larger regions, I get the following error message:

WARNING: XtX is not symmetric; forcing XtX to be symmetric by replacing XtX with (XtX + t(XtX))/2 Error in if (neg.loglik.logscale(lV, betahat = betahat, shat2 = shat2, : missing value where TRUE/FALSE needed Calls: susie_rss ... single_effect_regression_ss -> optimize_prior_variance In addition: There were 40 warnings (use warnings() to see them)

Any advice?

pcarbo commented 10 months ago

@maguileraf We've seen this error before; does setting estimate_prior_variance = FALSE make the error go away? If so, you can try with estimate_prior_method = "EM". Sometimes that helps.

maguileraf commented 10 months ago

Thank you for your suggestion @pcarbo. Unfortunately, I get the same error

pcarbo commented 10 months ago

Which version of susieR are you using? Do you have the latest version available from GitHub?

maguileraf commented 10 months ago

0.12.35

maguileraf commented 10 months ago

I upgraded to the latest version from GitHub (0.12.40) and I keep getting the same error in certain regions.

pcarbo commented 10 months ago

@maguileraf Any chance you would be able too share the code and data for one of the regions so we can try to reproduce this error?

maguileraf commented 10 months ago

@pcarbo unfortunately, I can't. However, I used refine=T and it fixed the error.

pcarbo commented 10 months ago

That's interesting. Thanks for sharing your solution — it may be useful to others.

maguileraf commented 10 months ago

@pcarbo refine=T fixed it for certain regions, but I keep getting this error for other regions. Any other suggestions?

pcarbo commented 10 months ago

@maguileraf Can you share at least the call to susie_rss and the console output (with verbose = TRUE), and then the error message that was generated? Perhaps this will give us some clues.

maguileraf commented 10 months ago

gwas_susie <- susie_rss(gwas$Z, R = loci_r_matrix, L = 10, n = 616211, estimate_residual_variance = FALSE, refine=T, verbose = TRUE) HINT: For large R or large XtX, consider installing the Rfast package for better performance. WARNING: XtX is not symmetric; forcing XtX to be symmetric by replacing XtX with (XtX + t(XtX))/2 Error in if (neg.loglik.logscale(lV, betahat = betahat, shat2 = shat2, : missing value where TRUE/FALSE needed In addition: There were 40 warnings (use warnings() to see them)

pcarbo commented 10 months ago

Can you share the warnings? There might also be some inconsistencies between z and R which you can check by following the guidance in this vignette.

maguileraf commented 10 months ago

I think it's an inconsistency between z and R. I am using UKB data to run a case/control GWAS using REGENIE. I've used the same file to generate the LD matrix, but I still get inconsistencies. Do you think the covariates used when running the GWAS can cause these discrepancies?

pcarbo commented 10 months ago

With REGENIE it is hard to say because it is not the same model as susie. Our general recommendation would be to use a method that uses a similar model to susie (e.g., the linear regression method in PLINK). For case-control data, a linear regression approach might be preferrable to logistic regression.

Regarding covariates, you may need to regress out the covariates from the genotype matrix (or LD matrix) if the covariates are correlated with the SNPs.

maguileraf commented 9 months ago

@pcarbo I am still trying to make this work. I have access to the individual-level data, but my case and controls are not balanced. Does it make sense to use susie instead of susie_rss?

pcarbo commented 9 months ago

@maguileraf Not knowing the full details of your analysis, it is hard to say for sure, but broadly speaking, the use of the full-data linear regression model for case-control data has been better studied and has stronger accuracy guarantees than the summary-data approach, so yes if you are able to use susie instead of susie_rss, that would likely give you better results. Hope that helps.

yningvu commented 5 months ago

Hi there,

I had the same issue while running with ref ld matrix. I think it is due to the inconsistency in dimensions between the sumstats file and the ld matrix.

For example, if you select a locus of a snp of interest, and then compute the ld matrix in a reference, it is very likely that some snps within the region of the locus are not included in the reference panel. Thus, you might have 2000 rows in the locus, but 1000 rows in the ld matrix.

pcarbo commented 5 months ago

@yningvu Can you please share your call to susie_rss and the exact error message produced?