privefl / paper-ldpred2

Paper discribing LDpred2
https://doi.org/10.1093/bioinformatics/btaa1029
16 stars 12 forks source link

big_prodVec or big_prodMat? #11

Closed Janina24 closed 2 years ago

Janina24 commented 2 years ago

Hi Florian,

when running big_prodVec(G_imp, beta_auto, ind.col = df_beta[["_NUMID"]]), I get the error message: error in pMatVec4(X, y.col, ind.row, ind.col, ncores = ncores) : Tested 501101 < 501100. Subscript out of bounds. 501,100 is the number of variants in G G_imp comes from Snp_fastImputeSimple(G).

when running pred_auto <- big_prodMat(G_imp, beta_auto, ind.col = df_beta[["_NUMID"]]), I get a matrix with NAs only.

Do you have any advice for me?

Thanks in advance!

privefl commented 2 years ago

ind.col seems outside of the range of 1 to ncol(G)

Janina24 commented 2 years ago

I think the problem is that nrow(df_beta) corresponds to the number of variants overlapping between the test data, the reference data and summary statistics. So I need to restrict variants of G to the same set of SNPs. Do you provide a solution for that?

privefl commented 2 years ago

Try matching between the $map and df_beta with e.g. vctrs::vec_match().

Otherwise, IIRC, there is another issue here mentioning three different datasets.

Janina24 commented 2 years ago

G: Genotype data from our own study (~500,100 SNPs) df_beta: from matching variants in summary statistics and variants in LD reference (HapMap3) restricted to variants also included in test data from our study, remaining: ~155,555 variants for calculation of correlation matrix and betas.

But when calculating predictions I have to restrict my genotype data G to the same variant set.

Or is this a wrong assumption of mine?

privefl commented 2 years ago

You have to match those from df_beta (the ones used in the prediction) to the ones you have in G.

Also, if you do not have imputed data, I would recommend to directly use your set of variants (instead of HM3) and compute the LD ref from that, otherwise the overlap is very small.

Janina24 commented 2 years ago

Thank you, it worked well using my own data for the correlation matrix :)