This repo contains pipeline files for the reference-aware mtSwirl pipeline as well as the code used to run, merge, and annotate the results.
This pipeline was released as part of the manuscript: Nuclear genetic control of mitochondrial DNA copy number and heteroplasmy in humans
, which can be found at Nature. If you use these resources in your work, please cite as Gupta et al. 2023 Nature
:
Gupta, R., Kanai, M., Durham, T.J. et al. Nuclear genetic control of mtDNA copy number and heteroplasmy in humans. Nature, in press. https://doi.org/10.1038/s41586-023-06426-5.
Individual level data corresponding to mtDNA copy number (before and after covariate correction) and the post-QC variant callset can be found:
Nuclear genetic control of mtDNA copy number and heteroplasmy in humans
workspace. Note that controlled tier access is required to clone this workspace.Summary statistics from UKB are available:
effect_allele
and other_allele
columns were originally reversed. No other columns were changed. No data deposited in other locations (e.g., GCP, AllofUs; see below) required updating.gs://mito-wgs-public-2023
bucket. Please note that this is a requester pays bucket. This bucket also contains ukb_b37_b38_lifted_variants.tsv.bgz
, which maps GRCh37 coordinates in the UKB data to GRCh38. The summary statistics on GCP correspond to the same data, but are stored using the Pan UKB schema. These files contain the cross-ancestry meta-analysis as well as per-ancestry association statistics as well (and thus are more comprehensive than those on GWAS Catalog). More information on the schema is described in the README_ukb.md
file located in the gs://mito-wgs-public-2023
bucket.Summary statistics from AoU are available in the Nuclear genetic control of mtDNA copy number and heteroplasmy in humans
workspace in the same format as UKB summary statistics found on GCP. Note that controlled tier access is required to clone this workspace.
See Supplementary table 1 for sample size information.
Please note that at the time of writing, there is no mechanism by which custom workspaces in AoU can be made available to anyone with controlled tier access. Thus, we ask that in the interim, any users who desire to work with these data in AoU contact us to be added to the workspace. We are committed to making these data automatically available when this mechanism becomes available, and plan to beta-test this functionality when it is possible to do so.
See the WDL folder for the self-contained WDL. The v2.5_MongoSwirl_Single
folder contains the single-sample pipeline oriented for use with Cromwell. The v2.6_MongoSwirl_Multi
folder contains a multi-sample pipeline for use on the UKB Research Analysis Platform using dxCompiler. This folder also contains supporting scripts and reference NUMTs used to generate nucDNA self-reference sequences. See manuscript Methods for more details.
The generate_mtdna_call_mt
folder contains code used to merge single-sample VCFs into Hail MatrixTables. This code was written originally as an extension of code previously released for mtDNA analysis (Laricchia et al. 2022 Genome Res). Scripts in the root of this folder work across any platform; scripts in each sub-folder are platform specific.
Run dx_pipeline.sh
to run the merging pipeline.
aou_mtdna_analysis_launcher.sh
to run the WDL. Tweak the parameters in the header for your configuration.aou_annotate_coverage.py
aou_combine_vcfs.py
annotate_coverage.py
combine_vcfs.py
process_sample_stats.py
add_annotations.py
To run GWAS in UKB use the files in gwas_ukb
. Using the outputs of QC, we run covariate correction with generate_covariate_corrected_traits.Rmd
for mtCN (and for sensitivity analyses). To produce final heteroplasmy phenotypes, we use produce_final_HL_traits.Rmd
. We use saige_pan_ancestry_custom.py
to run SAIGE in UKB with custom_load_custom_sumstats_into_mt.py
to combine results into an MT.
We use the files in gwas_aou
to run GWAS in AoU. To produce custom PCs by recomputing them per-ancestry, we use run_per_ancestry_pca.py
. We run aou_run_full_hl_gwas.py
to run the GWAS.