Open chanwkimlab opened 4 years ago
There are 10M variants with MAF>1% and INFO>0.3 in the UKBB, so it is plenty enough for simulations.
For real data, I argue in this paper that these should be parameters of the predictive methods. I include INFO in the main results and quickly compare MAF for one disease, and both parameters seems to be important to optimize.
Thank you for your quick response. Now I understand that no QC was applied before optimization. Thank you again for your explanation.
I have an additional question.
in paper3-SCT/code_real/03-mult-small-T1D.R
line250 --h2 0.88
--h2
heritability parameter was used running LDpred.
Where did you get the h2 value? (ex.ldsc)
Does --h2
parameter significantly affect the result of LDpred?
I couldn't find a comment regarding this in the publication. I guess this is because it was a trivial one.
I used values from there: https://www.snpedia.com/index.php/Heritability.
But I was told that this parameter is not very important, and you can skip it.
Dear Florian
Firstly, I'd like to thank you for developing a robust PRS method, SCT.
I am reproducing the result of 'Making the Most of Clumping and Thresholding for Polygenic Scores'. However, I have a question regarding the QC criteria.
I can see that in the 'simulation' section, the QC criteria of MAF>1%, INFO>0.3 were used for UKB genotype. On the contrary, in the 'Real Summary Statistics' section, I couldn't find any comment on QC criteria for UKB genotype.
I found that in many of your codes, path
data/ukb_imp_mfi/ukb_mfi_chr
was loaded. Doesmfi
mean a sort of QC criteria, or Is it just the output ofukbgene imp
command? (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/ukbgene_instruct.html)Best regards Chanwoo Kim