related-sciences / ukb-gwas-pipeline-nealelab

Pipeline for reproduction of NealeLab 2018 UKB GWAS
4 stars 3 forks source link

What exactly are we trying to reproduce? #1

Open eric-czech opened 4 years ago

eric-czech commented 4 years ago

Here is a timeline for the NealeLab GWAS results:

The lineage for some things related to GWAS are in https://github.com/Nealelab/UK_Biobank_GWAS but it's not very easy to follow. The v1/v2/v3 pipelines are not at all full rewrites or versioned in a particular way, and the v2 pipeline (in imputed-v2-gwas) appears to reuse much of the data generated in v1. V3 then appears to just be reruns of v2 with slight changes to various files/parameters.

The lineage for phenotype preparation is even more difficult to follow. For example, https://github.com/astheeggeggs/PHESANT/blob/master/index.md would suggest that outcome_info_final_round2.tsv is the right list of UKB phenotype variables to use but outcome_info_final_round3.tsv also exists and was created around the time the v3 GWAS runs were done. The code that gets run is more reliable than the documentation for sure and the UKBB_ldsc_scripts repo as well as ukbb_pan_ancestry contain all the information needed (I think) to figure out how to reproduce any one thing, at least as far as phenotypes are concerned.

The target for this pipeline will be the latest version of the v3 pipeline without biomarker measurements. This generally means that we'll be mirroring the v2 code and trying to match the new changes to phenotype prep, variant annotation, and QC that have been made since the first v2 run.