privefl / simus-PRS

Simulations and comparisons of Polygenic Risk Scores methods.
7 stars 2 forks source link

QC criteria used for UKB genotype in Real Summary Statistics section #1

Open chanwkimlab opened 4 years ago

chanwkimlab commented 4 years ago

Dear Florian

Firstly, I'd like to thank you for developing a robust PRS method, SCT.

I am reproducing the result of 'Making the Most of Clumping and Thresholding for Polygenic Scores'. However, I have a question regarding the QC criteria.

I can see that in the 'simulation' section, the QC criteria of MAF>1%, INFO>0.3 were used for UKB genotype. On the contrary, in the 'Real Summary Statistics' section, I couldn't find any comment on QC criteria for UKB genotype.

I found that in many of your codes, path data/ukb_imp_mfi/ukb_mfi_chr was loaded. Does mfi mean a sort of QC criteria, or Is it just the output of ukbgene imp command? (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/ukbgene_instruct.html)

Best regards Chanwoo Kim

privefl commented 4 years ago

There are 10M variants with MAF>1% and INFO>0.3 in the UKBB, so it is plenty enough for simulations.

For real data, I argue in this paper that these should be parameters of the predictive methods. I include INFO in the main results and quickly compare MAF for one disease, and both parameters seems to be important to optimize.

chanwkimlab commented 4 years ago

Thank you for your quick response. Now I understand that no QC was applied before optimization. Thank you again for your explanation.

chanwkimlab commented 4 years ago

I have an additional question. in paper3-SCT/code_real/03-mult-small-T1D.R line250 --h2 0.88 --h2 heritability parameter was used running LDpred.

Where did you get the h2 value? (ex.ldsc) Does --h2 parameter significantly affect the result of LDpred? I couldn't find a comment regarding this in the publication. I guess this is because it was a trivial one.

privefl commented 4 years ago

I used values from there: https://www.snpedia.com/index.php/Heritability.

But I was told that this parameter is not very important, and you can skip it.