privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Questions about QC Process #341

Closed jfrank94 closed 2 years ago

jfrank94 commented 2 years ago

I have some questions about the overall QC process based on your blog, Florian. There's an outlier check based on MAF and standard deviations of the internal/external datasets on chapter 6. Is that a part of the QC process, or the only QC process is based on chapters 3 and 4 with Pre-processing/Imputation and Population Structure Analysis respectively?

Secondly, can you explain the logic for this piece of code for detecting the outliers in chapter 6? Are these numbers based on confidence intervals or something else? is_bad <- sd_ss < (0.5 * sd_val) | sd_ss > (sd_val + 0.1) | sd_ss < 0.1 | sd_val < 0.05

Lastly, is all of chapter 6 a standard setup for LDpred2 and other PRS modeling (i.e. C+T, Lassosum, etc.)?

Thank you in advance for your help.

privefl commented 2 years ago

The bigsnpr-extdoc is more an overview of what you can do. You can refer to each tutorial of the different PRS methods for building each.

The QC on sd_ss or sd_val alone is basically a QC on MAF. And the other two QCs are to check departure from sd_ss==sd_val.

jfrank94 commented 2 years ago

Thank you for confirming that it's a QC process and it's based on MAF, but where did you derive the numbers from?

Also on your blog, you're using LDpred2 with this method, but can this apply to other methods of PRS (ie C+T or Lassosum)?

Also, can you consider this QC process (outlier detection based on MAF) as a part of the standard GWAS process? Furthermore, could you consolidate outlier detection based on MAF with chapter 3's pre-processing prior to PRS modeling?

In a project I'm contributing to, we've already filtered out SNPs whose MAF is less than 0.01 which is part of the standard GWAS process, so would this outlier detection actually be needed on top of what we've implemented so far? Please explain about how this can fit in the standard GWAS QC.

privefl commented 2 years ago

Yes, this can be applied prior to any PGS method, or sumstats-based method actually. It's kind of what we argue in this preprint. I guess C+T would be more robust to this. We also argue that this should perform as part of the GWAS, e.g. before meta-analyzing results. The QC from chapter 3 is based on individual-level data, not summary statistics.

privefl commented 2 years ago

Any update on this?

jfrank94 commented 2 years ago

Apologies @privefl. Thank you for your help and explanations. That's pretty much it.