privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Got only NA in calculating PRS #478

Closed KirakiraZLY closed 2 months ago

KirakiraZLY commented 4 months ago

Hi Florian,

I have been trying to use LDpred2 recently and follow the instructions from your tutorial. I ran it successfully when I used your dataset, but some problems occurred when I turned to my datasets.

I used FinnGen summary statistics (full dataset and converted to hg37) and ukbb (600K snps * 300K inds) as the reference panel. I calculated the beta_inf successfully, but I cannot get pred_inf. I printed it and got NA only.

pred_inf <- big_prodVec( genotype, beta_inf, ind.row = ind.test, ind.col = info_snp$_NUM_ID_)

head(beta_inf) [1] -1.626186e-04 5.349350e-04 -3.673646e-06 -1.245429e-03 1.014742e-03 [6] -1.352575e-04

nrow(genotype) [1] 392214

length(beta_inf) [1] 147383

length(ind.test) [1] 392214

length(info_snp$_NUM_ID_) [1] 147383

head(pred_inf) [1] NA NA NA NA NA NA

length(pred_inf) [1] 392214

I thought it was because the reference panel was too large, so I submitted a job to cluster. The PRS score still cannot be calculated, which you can see in line 35 of the following picture. image

Could you please give me some advice on this part, to see what happened here?

I'm looking forward to hearing back from you. 😊

Best regards, Zhang Leyi

privefl commented 4 months ago

I guess you have missing values in the genotype matrix you're using. Other similar issues have been reported here. You should probably use the imputed UKBB BGEN data, and also switch to HapMap3/HapMap3+ variants instead of the genotyping chip. Or simply do some quick imputation of the genotyping data (with snp_fastImputeSimple(., method = "mean2") if there aren't too many missing values.

KirakiraZLY commented 3 months ago

Hi Florian,

I am trying to use the auto model to calculate PRS. I previously ran the infinitesimal model successfully based on my data (I dropped the missing values already), and I ran the auto model with the tutorial data that you provided successfully. However, when I changed the genotype data obj.bigSNP with my genotype data, the multi_auto will get NA only.

Do you have any suggestions on this?

Best regards, Zhang Leyi

privefl commented 3 months ago

Are there NAs in the model now? There are directions you can follow in the tutorial, and in other issues here.

privefl commented 2 months ago

Any update on this?

KirakiraZLY commented 2 months ago

Hi,

Sorry that I forgot to reply. I still have NAs in the auto model, and I tried to run it using the datasets from the tutorial: https://choishingwan.github.io/PRS-Tutorial/ldpred/, while I got the same problem as I used my own datasets.

privefl commented 2 months ago
KirakiraZLY commented 2 months ago

Hi,

Thanks, and it can work now.

While the problem of missing SNPs is still significant, that is, I am using the genotype set containing 1M SNPs, if I set the missingness threshold to be 0, then only 100K remains, which is bad for the calculation. Is there any way not to let it be so strict?

privefl commented 2 months ago

Are you using doing some kind of QC step? I don't know which function you're using here.

KirakiraZLY commented 2 months ago

Hi,

I've used the imputation function snp_fastImputeSimple() to solve the missingness problem, and it seems to work now. However, I found another problem with getting the genotype correlation matrix, that I've reduced my genotype dataset to 300K snps and 300K individuals, while it's still super slow, around 1 to 2 hours for one chromosome. I then looked at your example Rscript, but there is a line:

corr_chr <- readRDS(paste0("data/corr_hm3_plus/LD_with_blocks_chr", chr, ".rds"))[ind.chr3, ind.chr3]

I don't know where can I download or get this .rds file, could you give me some idea about it?

privefl commented 2 months ago

Where to find the pre-computed LD matrices is mentioned in the LDpred2 tutorial.

I'm closing this now. If you have other questions, please open another issue.