Closed JingZhang1227 closed 3 years ago
For the alleles it really depends, sometimes you don't know which one is which. What you need to match is between your sumstats and your test data (the correlation matrix is invariant if you switch these two), so it is your job to make sure these are the same. Basically, if you get polygenic scores with a negative predictive value (cor < 0 or AUC < 0.5) for all, then it is a good sign that should switch the alleles.
if you want to also match by the alleles, you can probably use snp_match()
in this step.
yes you should use your test set as G
, and can use it to perform the QC on the scales (as you do not use the phenotype in the QC).
Thank you very much for getting back to me! I will match the alleles in my summary statistics and test dataset carefully. Sorry I didn't fully understand the QC part of the answer (in the third point). Would you mind elaborate a bit further on this?
Thanks again!
Got it! Thank you very much!
Hi Florian,
I am using LDpred2 to calculate polygenic risk scores. I am running the automatic model with the provided LD reference based on the example-with-provided-ldref.R.
I have three datasets: the summary statistics, the provided LD reference, and a test dataset (plink format). The positions of both the summary statistics and the test dataset are hg38.
I was wondering if I could ask a few questions about the example-with-provided-ldref.R.
ind.test <- 1:nrow(G) pred_auto <- big_prodMat(G, beta_auto,ind.row = ind.test, ind.col =
df_beta[["_NUM_ID_"]],
ncores = NCORES)I assume the UKBB_imp_HM3.rds is used to calculate the provided LD reference. Because I used the G from my test dataset and it is not matched to df_beta by a0 and a1 (only by chr and pos), I was wondering if it will cause problem here.
I really appreciate any suggestions!