xinhe-lab / ctwas

package for the causal TWAS project
https://xinhe-lab.github.io/ctwas/
MIT License
32 stars 12 forks source link

Unclear error message #2

Closed ctggroup closed 1 year ago

ctggroup commented 1 year ago

I am getting the following error:

res <- impute_expr_z(z_snp = z_snp, weight = weight.predictdb, ld_pgenfs = ld_pgenfs, method = "lasso", outputdir = outputdir, outname = "res_ss") 2022-09-30 14:52:23 INFO::flipping z scores to match LD reference 2022-09-30 14:52:32 INFO::will also flip weights to match LD reference for each gene 2022-09-30 14:52:32 INFO::Reading weights for chromosome 1 2022-09-30 14:52:33 INFO::number of genes with weights provided: 13199 2022-09-30 14:52:33 INFO::collecting gene weight information ...

2022-09-30 15:11:27 INFO::Start gene z score imputation ... 2022-09-30 15:11:27 INFO::ld genotype is given, using genotypes to impute gene z score. Error in if (abs(max(gexpr) - min(gexpr)) < 1e-08) { : missing value where TRUE/FALSE needed

It occurs from the file 'twas_impute_expr_z.R, lines 98-107. It is unclear to me what is generating this error and I was hoping you could help.

simingz commented 1 year ago

what z scores and weights are you using? I think one reason that this may happen is there is no imputable genes, e.g. SNPs in the prediction model and SNPs with z scores do not match

ctggroup commented 1 year ago

I am using z scores from imputed UK Biobank data. I was using .db weight files from GTEx V8 downloaded directly from http://predictdb.org/ I have realised that these use different references - GRCh37 and GRCh38 so I need to lift my z-score SNP ids from GRCh37 to GRCh38.

However, as a test, I tried the example provided with the package and it completed fine with the same results are those listed in the workflow example.

Then I tried with the old GTEx .db files (v7) that were GRCh37 and I checked for SNP overlap between these .db files and the GWAS z-score file I am using. Of the 163581 rows in the .db file for whole blood, 150907 overlap where the rsIDs match across the two files. I know that all of the rsIDs of the z-scores match the .pgen files I am supplying as an LD reference. Yet when I run this analysis - the output informs me that 0 genes have been imputed for each chromosome and all .expr files are empty. I am sorry to ask more of your time, but could you possibly think why this is happening?

simingz commented 1 year ago

Not sure about this but maybe you can check a few things: which GTEx .db files are you using? Is the format the same as predictdb v8 files? Can you also check if the format and header of your input files match examples in the tutorial?

sunyeoplee commented 1 year ago

I get the similar error only when I am using my custom LD reference panel. When I use the LD reference panel provided by TWA FUSION website (http://gusevlab.org/projects/fusion/; https://data.broadinstitute.org/alkesgroup/FUSION/LDREF.tar.bz2), the analysis runs without any error. What would be the possible issue? Above, you suspected whether FUSION weight SNPs (SNPs in the prediction model) and outcome summary GWAS SNPs (SNPs with z scores) do not match. But, in my case, everything works with the FUSION LD reference panel but I get the following error when I use my own LD reference panel (1000Genome Phase 3 hg38).

One difference is that the FUSION panel has a smaller number of SNPs (e.g., 55464 for chr 9) compared to my own panel (e.g., 3384360 for chr 9).

Error:

2023-05-26 17:48:57 INFO::Start gene z score imputation ... 2023-05-26 17:48:57 INFO::ld genotype is given, using genotypes to impute gene z score. 2023-05-26 17:51:04 INFO::Start gene z score imputation ... 2023-05-26 17:51:04 INFO::ld genotype is given, using genotypes to impute gene z score. [[1]] [1] "Error in if (abs(max(gexpr) - min(gexpr)) < 1e-08) { : \n missing value where TRUE/FALSE needed\n" attr(,"class") [1] "try-error" attr(,"condition") <simpleError in if (abs(max(gexpr) - min(gexpr)) < 1e-08) { exprlist[[gname]] <- NULL} else { z.s <- as.matrix(z_snp[z.idx, "z"]) var.s <- sqrt(apply(X.g, 2, var)) Gamma.g <- cov(X.g) z.g <- (t(wgt) var.s) %% z.s/sqrt(t(wgt) %% Gamma.g %% wgt) exprlist[[gname]][["expr"]] <- gexpr exprlist[[gname]][["z.g"]] <- z.g}: missing value where TRUE/FALSE needed>

wesleycrouse commented 1 year ago

@sunyeoplee do you know if there are many invariant SNPs in your reference panel? The condition that is being checked is related to genes that are invariant after imputation. There may be some case that we are missing

sunyeoplee commented 1 year ago

I see that removing SNPs with zero MAF in the reference panel solves the issue. I also saw that the causal TWAS paper used MAF>0.05 threshold. Do you recommend this threshold compared to 0.01 or other thresholds? Thank you for helping out.

I see that the threshold affects the estimated gene prior and SNP prior quite a bit,

wesleycrouse commented 1 year ago

@sunyeoplee We used both a 0.05 and 0.01 threshold for the paper, depending on the data source. I think that lower MAF threshold is better as long as the computation is not prohibitive. My expectation is that changing the MAF should change the SNP prior (because there are fewer SNPs) but that the gene prior should not change much