Closed Janina24 closed 2 years ago
ind.col
seems outside of the range of 1 to ncol(G)
I think the problem is that nrow(df_beta) corresponds to the number of variants overlapping between the test data, the reference data and summary statistics. So I need to restrict variants of G to the same set of SNPs. Do you provide a solution for that?
Try matching between the $map and df_beta with e.g. vctrs::vec_match().
Otherwise, IIRC, there is another issue here mentioning three different datasets.
G: Genotype data from our own study (~500,100 SNPs) df_beta: from matching variants in summary statistics and variants in LD reference (HapMap3) restricted to variants also included in test data from our study, remaining: ~155,555 variants for calculation of correlation matrix and betas.
But when calculating predictions I have to restrict my genotype data G to the same variant set.
Or is this a wrong assumption of mine?
You have to match those from df_beta (the ones used in the prediction) to the ones you have in G.
Also, if you do not have imputed data, I would recommend to directly use your set of variants (instead of HM3) and compute the LD ref from that, otherwise the overlap is very small.
Thank you, it worked well using my own data for the correlation matrix :)
Hi Florian,
when running big_prodVec(G_imp, beta_auto, ind.col = df_beta[["_NUMID"]]), I get the error message: error in pMatVec4(X, y.col, ind.row, ind.col, ncores = ncores) : Tested 501101 < 501100. Subscript out of bounds. 501,100 is the number of variants in G G_imp comes from Snp_fastImputeSimple(G).
when running pred_auto <- big_prodMat(G_imp, beta_auto, ind.col = df_beta[["_NUMID"]]), I get a matrix with NAs only.
Do you have any advice for me?
Thanks in advance!