Closed SaraBecelaere closed 2 months ago
Can you show me the QC plot comparing standard deviations?
Hi Florian,
Below the QC plot:
Are you removing everything? Or skipping this QC step?
Is this a GWAS for a binary or continuous phenotype?
I skipped this QC step
This is a GWAS for height, so continuous
The summary statistics I used are from Yengo et al. 2022, the European summary stats. I used rsID’s instead of chr:pos since the samples I want to calculate the scores for are in build 38.
However, I doubt whether the GWAS is the issue here, since at some point I was able to run the auto model on a subset of the data (that had less SNPs). However, when using the full dataset I got errors, which seemed weird to me. That is why I changed the code (above) to taking an overlap of the SNPs between the HapMap dataset, GWAS and plink data before running the correlation matrix. Below the code I used on the subset, which didn’t work on the full dataset:
map_ldref <- readRDS(runonce::download_file("https://figshare.com/ndownloader/files/37802721", dir = "tmp-data", fname = "map_hm3_plus.rds"))
info_snp <- snp_match(sumstats, map_ldref, join_by_pos = FALSE)
obj.bigSNP <- snp_attach(“data.rds"))
G <- obj.bigSNP$genotypes
CHR <- obj.bigSNP$map$chromosome
POS <- obj.bigSNP$map$physical.pos
y <- obj.bigSNP$fam$affection
map_test <- setNames(obj.bigSNP$map[-3], c("chr", "rsid", "pos", "a1", "a0"))
in_test <- vctrs::vec_in(info_snp[, c("rsid")], map_test[, "rsid"])
df_beta <- info_snp[in_test, ]
Creating correlation matrix:
tmp <- tempfile(tmpdir = "tmp-data")
for (chr in 1:22) {
cat(chr, ".. ", sep = "")
ind.chr <- which(info_snp$chr == chr)
ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
ind.chr3 <- match(ind.chr2, which(map_ldref2$chr == chr))
corr_chr <- readRDS(paste0("/ldref_hm3_plus/LD_with_blocks_chr", chr, ".rds"))[ind.chr3, ind.chr3]
if (chr == 1) {
corr <- as_SFBM(corr_chr, tmp, compact = TRUE)
} else {
corr$add_columns(corr_chr, nrow(corr))
}}
df_beta2 <- snp_match(df_beta, map_test, join_by_pos = FALSE)
Getting h2:
ldsc <- with(df_beta, snp_ldsc(ld, nrow(map_ldref), chi2 = (beta / beta_se)^2, sample_size = n_eff, blocks = NULL))
ldsc_h2_est <- ldsc[["h2"]]
Running auto model:
coef_shrink <- 0.95
multi_auto <- snp_ldpred2_auto(corr, df_beta, h2_init = ldsc_h2_est, vec_p_init = seq_log(1e-4, 0.2, length.out = 30), ncores = NCORES, # use_MLE = FALSE, allow_jump_sign = FALSE, shrink_corr = coef_shrink)
I have used these GWAS summary statistics in a previous paper, and it worked just fine. Cf. this code, where you can skip the ancestry part at the beginning, and the annotation part at the end.
Hi Florian,
With your code I now got beta values! Thank you!
However, I’m now struggling to get predictions for my data, since the amount of beta’s and the amount of SNPs in my data are different. I tried the following:
obj.bigSNP <- snp_attach("data.rds")
G <- obj.bigSNP$genotypes
CHR <- obj.bigSNP$map$chromosome
POS <- obj.bigSNP$map$physical.pos
y <- obj.bigSNP$fam$affection
map_test <- setNames(obj.bigSNP$map[-3], c("chr", "rsid", "pos", "a1", "a0"))
df_beta2 <- data.frame(df_beta) #df_beta here is the same as in the code you referred me to
in_test_2 <- vctrs::vec_in(df_beta2[, c("rsid")], map_test[, "rsid"])
df_beta3 <- df_beta2[in_test_2, ]
info_snp3 <- snp_match(sumstats, df_beta3, join_by_pos = FALSE)
pred_auto_grid <- big_prodMat(G, beta_auto_grid, ind.col = info_snp3[["_NUM_ID_"]])
Then I get the following error:
Error: Incompatibility between dimensions. 'ind.col' and 'rows_along(A.col)' should have the same length.
Do you have any idea how to solve this?
Have a look at this example code.
I've tried the example code, but then I get the error below when trying the following code
pred_auto <- big_prodVec(G, beta_auto_grid * map_pgs2$beta,
ind.col = map_pgs2[["_NUM_ID_"]],
ncores = NCORES)
Error: Incompatibility between dimensions. 'y.col' and 'ind.col' should have the same length.
FYI, beta_auto_grid contains 1,008,330 lines and 50 columns; map_pgs2 contains 73,197 lines (I'm using a subset of my data to test the code, hence the low number of SNPs that match).
beta_auto_grid
seems to be a matrix, so you should use big_prodMat
instead.
Ah yes, I overlooked that!
However, I still get this error:
Error: Incompatibility between dimensions. 'ind.col' and 'rows_along(A.col)' should have the same length.
Here's the code I used:
pred_auto <- big_prodMat(G, beta_auto_grid * map_pgs2$beta, ind.col = map_pgs2[["_NUM_ID_"]], ncores = 1)
I guess you didn't filter on the variants in your test set beforehand (cf. this code).
Then you might need to use something like beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ]
instead.
I see that when using beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ]
the number of lines is the same as in map_pgs2 now. However, I get this error now:
Error in big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ind.col = map_pgs2[["_NUM_ID_.ss"]], :
all(ind.col <= ncol(X)) is not TRUE
Below the code used:
pred_auto <- big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ind.col = map_pgs2[["_NUM_ID_.ss"]], ncores = 1)
Do not modify ind.col
.
I wasn't sure whether you meant don't modify ind.col from previous code or don't add in the function. So I tried both and both give an error
pred_auto <- big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ncores = 1)
Error: Incompatibility between dimensions.
'ind.col' and 'rows_along(A.col)' should have the same length.
pred_auto <- big_prodMat(G, beta_auto_grid[map_pgs2[["_NUM_ID_.ss"]], ], ind.col = map_pgs2[["_NUM_ID_"]], ncores = 1)
Error in { :
task 1 failed - "Not compatible with requested type: [type=list; target=double]."
I meant the second one; leave it as before.
I thought beta_auto_grid
was a matrix; is it a data frame or a list?
Yes, beta_auto_grid
is a data frame, because I saved it from a previous run.
I transformed it into a matrix and now it seems to work! Thank you so much for your help with this!
Dear Florian,
I’m using LDpred2 to calculate PGS. I’ve been able to use the infinitesimal and grid model successfully. However, when I’m trying to run the auto grid and auto model, I’m running in some issues. I’ve done the following:
When I then check multi_auto[[1]], I see that there are NA’s everywhere (all beta_est, postp_est, corr_est are NA). When I change coef_shrink to 0.8 or 0.5, I still get the same result. I have also tried to set use_MLE = FALSE, and vary the coef_shrink parameter, but see no change. Any suggestions about what might be going wrong?
Below you can also see how I created corr, info_snp and ldsc_h2_est:
Thanks in advance for the help.
Best, Sara