mtCN result from WDL output files

Hello! Great to hear that the run was successful. If you ran this using Terra/Cromwell you usually have the option to save outputs in a table – the mtDNA mean coverage can be found in "mean_coverage" and the mtDNA median coverage can be found in "median_coverage". This can also be found in the "stats_outputs" file, which contains all of the per-sample statistics computed during the pipeline run – usually named "[sample_name]_mtanalysis_diagnostic_statistics.tsv".

To get mtCN, you'll need to get the mean nuclear DNA coverage. Often this is pre-computed as part of WGS QC. If it is not available, you can use samtools idxstats, samtools flagstat, and GATK CollectQualityYieldMetrics to quickly approximate this number. We implement this in the multi pipeline here (idxstats and flagstat) and here (CollectQualityYieldMetrics).

In the methods section of the paper (in "Computing mean nucDNA coverage in UKB") you can find the formula we used to efficiently compute the approximate nuclear genome coverage depth. You can see how we combined the outputs of idxstats/flagstat to obtain the relevant values here.

Once you have the appropriate statistics, we computed mtCN using: 2 * mtDNA coverage / nucDNA coverage.

Hope this is helpful!

rahulg603 / mtSwirl

mtCN result from WDL output files #4