rahulg603 / mtSwirl

12 stars 7 forks source link

mtCN result from WDL output files #4

Open pkrithivasan opened 1 year ago

pkrithivasan commented 1 year ago

Hi, we have successfully run the v2.5_MongoSwirl_Single pipeline on a single WGS sample, and are reviewing the output files. Could you please point us to the output file that indicates the mtDNA copy number (mtCN)? Thanks!

rahulg603 commented 1 year ago

Hello! Great to hear that the run was successful. If you ran this using Terra/Cromwell you usually have the option to save outputs in a table – the mtDNA mean coverage can be found in "mean_coverage" and the mtDNA median coverage can be found in "median_coverage". This can also be found in the "stats_outputs" file, which contains all of the per-sample statistics computed during the pipeline run – usually named "[sample_name]_mtanalysis_diagnostic_statistics.tsv".

To get mtCN, you'll need to get the mean nuclear DNA coverage. Often this is pre-computed as part of WGS QC. If it is not available, you can use samtools idxstats, samtools flagstat, and GATK CollectQualityYieldMetrics to quickly approximate this number. We implement this in the multi pipeline here (idxstats and flagstat) and here (CollectQualityYieldMetrics).

In the methods section of the paper (in "Computing mean nucDNA coverage in UKB") you can find the formula we used to efficiently compute the approximate nuclear genome coverage depth. You can see how we combined the outputs of idxstats/flagstat to obtain the relevant values here.

Once you have the appropriate statistics, we computed mtCN using: 2 * mtDNA coverage / nucDNA coverage.

Hope this is helpful!