privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

LDpred2: running auto model gives NAs for all beta_est #495

Closed SaraBecelaere closed 2 months ago

SaraBecelaere commented 2 months ago

Dear Florian,

I’m using LDpred2 to calculate PGS. I’ve been able to use the infinitesimal and grid model successfully. However, when I’m trying to run the auto grid and auto model, I’m running in some issues. I’ve done the following:

coef_shrink <- 0.95
multi_auto <- snp_ldpred2_auto(corr, info_snp, h2_init = ldsc_h2_est, vec_p_init = seq_log(1e-4, 0.2, length.out = 30), ncores = 2, #use_MLE = FALSE,  allow_jump_sign = FALSE, shrink_corr = coef_shrink)

When I then check multi_auto[[1]], I see that there are NA’s everywhere (all beta_est, postp_est, corr_est are NA). When I change coef_shrink to 0.8 or 0.5, I still get the same result. I have also tried to set use_MLE = FALSE, and vary the coef_shrink parameter, but see no change. Any suggestions about what might be going wrong?

Below you can also see how I created corr, info_snp and ldsc_h2_est:

obj.bigSNP <- snp_attach("data.rds"))
G   <- obj.bigSNP$genotypes
CHR <- obj.bigSNP$map$chromosome
POS <- obj.bigSNP$map$physical.pos
y   <- obj.bigSNP$fam$affection
map_test <- setNames(obj.bigSNP$map[-3], c("chr", "rsid", "pos", "a1", "a0"))
map_ldref < -readRDS(runonce::download_file("https://figshare.com/ndownloader/files/37802721", dir = "tmp-data"), fname = "map_hm3_plus.rds"))
map_ldref <- as.data.frame(map_ldref)
in_test <- vctrs::vec_in(map_ldref[, c("rsid")], map_test[, "rsid"])
map_ldref2 <- map_ldref[in_test, ]
info_snp <- snp_match(sumstats, map_ldref2, join_by_pos = FALSE)
tmp <- tempfile(tmpdir = "tmp-data"))
for (chr in 1:22) {
cat(chr, ".. ", sep = "")
ind.chr <- which(info_snp$chr == chr)
ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
ind.chr3 <- match(ind.chr2, which(map_ldref2$chr == chr))
corr_chr <- readRDS(paste0("/ldref_hm3_plus/LD_with_blocks_chr", chr, ".rds"))[ind.chr3, ind.chr3]
  if (chr == 1) {
corr <- as_SFBM(corr_chr, tmp, compact = TRUE)
} else {
corr$add_columns(corr_chr, nrow(corr))
}}
ldsc <- with(info_snp, snp_ldsc(ld, nrow(map_ldref2), chi2 = (beta / beta_se)^2,
                                sample_size = n_eff, blocks = NULL))
ldsc_h2_est <- ldsc[["h2"]]

Thanks in advance for the help.

Best, Sara

privefl commented 2 months ago

Can you show me the QC plot comparing standard deviations?

SaraBecelaere commented 2 months ago

Hi Florian,

Below the QC plot:

QC_plot_sd_comparison

privefl commented 2 months ago
SaraBecelaere commented 2 months ago
privefl commented 2 months ago
SaraBecelaere commented 2 months ago
map_ldref <- readRDS(runonce::download_file("https://figshare.com/ndownloader/files/37802721", dir = "tmp-data", fname = "map_hm3_plus.rds"))
info_snp <- snp_match(sumstats, map_ldref, join_by_pos = FALSE)
obj.bigSNP <- snp_attach(“data.rds"))
G   <- obj.bigSNP$genotypes
CHR <- obj.bigSNP$map$chromosome
POS <- obj.bigSNP$map$physical.pos
y   <- obj.bigSNP$fam$affection
map_test <- setNames(obj.bigSNP$map[-3], c("chr", "rsid", "pos", "a1", "a0"))
in_test <- vctrs::vec_in(info_snp[, c("rsid")], map_test[, "rsid"])
df_beta <- info_snp[in_test, ]

Creating correlation matrix:

tmp <- tempfile(tmpdir = "tmp-data")
for (chr in 1:22) {
cat(chr, ".. ", sep = "")
ind.chr <- which(info_snp$chr == chr)
ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
ind.chr3 <- match(ind.chr2, which(map_ldref2$chr == chr))
corr_chr <- readRDS(paste0("/ldref_hm3_plus/LD_with_blocks_chr", chr, ".rds"))[ind.chr3, ind.chr3]
  if (chr == 1) {
corr <- as_SFBM(corr_chr, tmp, compact = TRUE)
} else {
corr$add_columns(corr_chr, nrow(corr))
}}
df_beta2 <- snp_match(df_beta, map_test, join_by_pos = FALSE)

Getting h2:

ldsc <- with(df_beta, snp_ldsc(ld, nrow(map_ldref), chi2 = (beta / beta_se)^2, sample_size = n_eff, blocks = NULL))
ldsc_h2_est <- ldsc[["h2"]]

Running auto model:

coef_shrink <- 0.95  
multi_auto <- snp_ldpred2_auto(corr, df_beta, h2_init = ldsc_h2_est, vec_p_init = seq_log(1e-4, 0.2, length.out = 30), ncores = NCORES, # use_MLE = FALSE, allow_jump_sign = FALSE, shrink_corr = coef_shrink)
privefl commented 2 months ago

I have used these GWAS summary statistics in a previous paper, and it worked just fine. Cf. this code, where you can skip the ancestry part at the beginning, and the annotation part at the end.

SaraBecelaere commented 2 months ago

Hi Florian,

With your code I now got beta values! Thank you!

However, I’m now struggling to get predictions for my data, since the amount of beta’s and the amount of SNPs in my data are different. I tried the following:

obj.bigSNP <- snp_attach("data.rds")
G   <- obj.bigSNP$genotypes
CHR <- obj.bigSNP$map$chromosome
POS <- obj.bigSNP$map$physical.pos
y   <- obj.bigSNP$fam$affection
map_test <- setNames(obj.bigSNP$map[-3], c("chr", "rsid", "pos", "a1", "a0"))
df_beta2 <- data.frame(df_beta) #df_beta here is the same as in the code you referred me to
in_test_2 <- vctrs::vec_in(df_beta2[, c("rsid")], map_test[, "rsid"])
df_beta3 <- df_beta2[in_test_2, ]
info_snp3 <- snp_match(sumstats, df_beta3, join_by_pos = FALSE)
pred_auto_grid <- big_prodMat(G, beta_auto_grid, ind.col = info_snp3[["_NUM_ID_"]])

Then I get the following error:

Error: Incompatibility between dimensions. 'ind.col' and 'rows_along(A.col)' should have the same length.

Do you have any idea how to solve this?

privefl commented 2 months ago

Have a look at this example code.

SaraBecelaere commented 2 months ago

I've tried the example code, but then I get the error below when trying the following code

pred_auto <- big_prodVec(G, beta_auto_grid * map_pgs2$beta,
                         ind.col = map_pgs2[["_NUM_ID_"]],
                         ncores = NCORES)

Error: Incompatibility between dimensions. 'y.col' and 'ind.col' should have the same length.

FYI, beta_auto_grid contains 1,008,330 lines and 50 columns; map_pgs2 contains 73,197 lines (I'm using a subset of my data to test the code, hence the low number of SNPs that match).

privefl commented 2 months ago

beta_auto_grid seems to be a matrix, so you should use big_prodMat instead.

SaraBecelaere commented 2 months ago

Ah yes, I overlooked that!

However, I still get this error:

Error: Incompatibility between dimensions. 'ind.col' and 'rows_along(A.col)' should have the same length.

Here's the code I used:

pred_auto <- big_prodMat(G, beta_auto_grid * map_pgs2$beta, ind.col = map_pgs2[["_NUM_ID_"]], ncores = 1)

privefl commented 2 months ago

I guess you didn't filter on the variants in your test set beforehand (cf. this code).

Then you might need to use something like beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ] instead.

SaraBecelaere commented 2 months ago

I see that when using beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ] the number of lines is the same as in map_pgs2 now. However, I get this error now:

Error in big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ind.col = map_pgs2[["_NUM_ID_.ss"]],  :
  all(ind.col <= ncol(X)) is not TRUE

Below the code used:

pred_auto <- big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ind.col = map_pgs2[["_NUM_ID_.ss"]], ncores = 1)

privefl commented 2 months ago

Do not modify ind.col.

SaraBecelaere commented 2 months ago

I wasn't sure whether you meant don't modify ind.col from previous code or don't add in the function. So I tried both and both give an error

privefl commented 2 months ago

I meant the second one; leave it as before. I thought beta_auto_grid was a matrix; is it a data frame or a list?

SaraBecelaere commented 2 months ago

Yes, beta_auto_grid is a data frame, because I saved it from a previous run.

I transformed it into a matrix and now it seems to work! Thank you so much for your help with this!