Closed kathrynfreeman closed 1 year ago
The provided LD ref has many advantages, and you should prefer using it.
The main reason why you should not use it is when you do not have imputed data and the overlap between your test data and the LD ref is not great.
The second main reason is when you're building PGS using GWAS from another ancestry.
Thank you for your response! I'm so sorry for my confusion.
Once I use the ldref correlation code, can I just continue with the code in the tutorial for LDpred2-auto? It appears to run properly but I want to ensure I am calculating the right thing. Does the code in the example with ldref also deal with summary stats QC or does that need to be done separately?
Additionally, is the number of variants used to calculate the pred score represented by the number of observations within df_beta
?
Thank you so much again, Kate
Sumstats QC needs to be performed before.
Yes, all variants you have as input (in df_beta
and corr
) will be used.
Hi Dr. Privé,
I just wanted to make sure the correlation here from the example with provided ldref:
can be used in place of the below correlation code from the tutorial (when my dataset has less than 2000 individuals) to calculate
ldsc
and all the proceeding required to getpred_auto
.I use code similar to the ldref example code until the
h2_est <- ldsc[["h2"]]
step and then compute LDpred2-auto using the code from the polygenic scoring vignette/tutorial.If a dataset is barely over 2000 individuals is it harmful to still use the provided ldref example code pipeline until the
h2_est <- ldsc[["h2"]]
step?Thank you in advance, Kate