privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Error when running example-with-provided-LD.R #486

Closed alhannae closed 3 months ago

alhannae commented 3 months ago

Hi Florian,

I ran your LDpred2 script without the pre-computed matrices successfully. However, when running the script with the pre-computed matrices, I ran into a few errors (https://github.com/privefl/paper-infer/blob/main/code/example-with-provided-LD.R)

There is a problem when reading in the breast cancer GWAS, I can not run the rest of the script because of this. However, when I adjust the script to the GWAS that I use for the script without the pre-computed matrices (the public data used here > https://privefl.github.io/bigsnpr/articles/LDpred2.html) , everything goes smoothly untill I have to calculate the scores:

pred_auto <- big_prodVec(G, beta_auto, ind.col = df_beta[["_NUMID"]]) Error: Tested 45338 < 45337. Subscript out of bounds.

any idea why this is? If needed, I can provide the full R script that I run.

Thanks in advance! Much appreciated.

Kind regards,

Hannae

privefl commented 3 months ago

Basically, max(df_beta[["NUM_ID"]]) is larger than ncol(G); you should verify the column indices you provide to this function.

privefl commented 3 months ago

For the error reading the GWAS summary statistics, please open another issue.

alhannae commented 3 months ago

Hi Florian,

The problem occurs when I run this:

in_test <- vctrs::vec_in(df_beta[, c("chr", "pos")], map_test[, c("chr", "pos")]) df_beta <- df_beta[in_test, ]

Since the _NUMID column refers to the matching of the hapmap file and sumstats [ info_snp <- snp_match(sumstats, map_ldref)].

I tried to fix it by adding this to the script (after the part where the genome-wide correlation matrix is calculated): df_beta2 <- snp_match(df_beta, map_test) where I match the two datasets so the _NUMID-column is adjusted.

When I then run: pred_auto <- big_prodVec(G, beta_auto, ind.col = df_beta2[["NUM_ID"]]) > this seems to work.

Is my work-around correct?

privefl commented 3 months ago

A good indication that df_beta2[["NUM_ID"]] is the right indices to use if when its range() is close to c(1, ncol(G)).

alhannae commented 3 months ago

Yes that is the case!

range(df_beta2[["_NUMID"]]) [1] 1 45332 c(1, ncol(G)) [1] 1 45337

I am just scared that maybe by matching the df_beta with map, something goes wrong with assigning the beta to the correct variant.

privefl commented 3 months ago

Seems fine. You'll see if you find some association between the PGS and the outcome.