mills-lab / dinumt

13 stars 8 forks source link

Discovery of Nuclear Mitochondrial Insertions (dinumt)

Description

This software is designed to identify and genotype nuclear insertions of mitochondrial origin from whole genome sequence data. It consists of two programs: dinumt (di-nu-mite), which identifies sites of insertions in a single sample and gnomit (geno-mite), which genotypes those sites across multiple samples. There is an additional program named clusterNumtsVcf which will merge sites identified in multiple samples into a single merged file for genotyping.

Required third-party resources

A number of third party software packages are required by these programs:

In addition, you will need:

The genotyping step requires the use of a sample index file containing various sample-level information (mean insert size, coverage, etc). A template has been provided, and the relevant data can be obtained by using GATK (DepthOfCoverage walker) and Picard (CollectInsertSizeMetrics) or custom scripts. If you are running dinumt in cram files under reference genome version GRCh38, please use the corresponding .pl in the folder.

Parameters

Additional information about various parameters below:

Example workflow

An example workflow would be as follows:

dinumt.pl \
--mask_filename=refNumts.bed \
--input_filename=sample1.bam \
--reference=hs37d5.fa \
--min_reads_cluster=1 \
--include_mask \
--output_filename=sample1.vcf \
--prefix=sample1 \
--len_cluster_include=577 \
--len_cluster_link=1154 \
--insert_size=334.844984 \
--max_read_cov=29 \
--output_support \
--support_filename=sample1_support.sam
grep ^# sample1.vcf > header.txt
cat *vcf | grep -v ^# | vcf-sort.pl | clusterNumtsVcf.pl --reference=hs37d5.fa > data.txt
cat header.txt data.txt > merged.vcf

(merged vcf can be split into smaller pieces with multiple sets of sites run in parallel, if need be)

gnomit.pl \
--input_filename=merged.vcf \
--mask_filename=refNumts.bed \
--info_filename=sampleInfo \
--output_filename=merged_geno.vcf \
--samtools=samtools \
--reference=hs37d5.fa \
--breakpoint \
--min_map_qual=13 \
--dir_tmp=/tmp \
--exonerate=exonerate \
--mt_filename=MT.fa

Citation

Contact

Questions: Please contact Ryan Mills at remills@umich.edu 04/16/2014