mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
90 stars 38 forks source link

chrM naming issues for rCRS option #17

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi there,

I found several inconsistencies in chromosome naming scheme when using the pipeline with rCRS option, and I would like to make some suggestions.

First, in test_config.sh, instruction for setting mtdb_fasta incorrectly states that the parameter should be set to MT.fa when using rCRS as reference:

##OPTIONAL. If MToolBox default installation (install.sh) was used, mtdb_fasta is chrRSRS.fa (RSRS). To use rCRS sequence as reference, please set mtdb_fasta=MT.fa Otherwise, please specify the file name of the mitochondrial reference fasta sequence you want to use.
##
#mtdb_fasta=chrRSRS.fa

If mtdb_fasta is set to MT.fa, it later breaks the pipeline when it gets to assembleMTgenome.py stage because it's looking for MT.fa file in genome_fasta/ directory and the file does not exist. The instruction should say mtdb_fasta should be set to chrM.fa, reflecting the name change.

Second, chromosome names must be updated from chrRCRS to chrM in all rCRS related files in MToolBox/data/directory (i.e. chrRCRS.fa, chrRCRS.fa.fai, chrRCRS.dict, intervals_file_RCRS.list, MITOMAP_HMTDB_known_indels_RCRS.vcf). Otherwise, this breaks the pipeline at indel realigning stage because GATK complains about not being able to match chrM with chrRCRS as below:

NFO  16:14:45,697 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:14:45,699 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51 
INFO  16:14:45,699 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  16:14:45,699 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  16:14:45,703 HelpFormatter - Program Args: -T IndelRealigner -R bio/mtoolbox/MToolBox/data/chrRCRS.fa -I bio/m201/OUT_SAMPLE102/OUT.sam.bam -o bio/m201/OUT_SAMPLE102/OUT.realigned.bam -targetIntervals bio/mtoolbox/MToolBox/data/intervals_file_RCRS.list -known bio/mtoolbox/MToolBox/data/MITOMAP_HMTDB_known_indels_RCRS.vcf -compress 0 
INFO  16:14:45,703 HelpFormatter - Date/Time: 2016/07/27 16:14:45 
INFO  16:14:45,704 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:14:45,704 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:14:45,714 ArgumentTypeDescriptor - Dynamically determined type of bio/mtoolbox/MToolBox/data/MITOMAP_HMTDB_known_indels_RCRS.vcf to be VCF 
INFO  16:14:46,321 GenomeAnalysisEngine - Strictness is SILENT 
INFO  16:14:46,405 GenomeAnalysisEngine - Downsampling Settings: No downsampling 
INFO  16:14:46,412 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
WARNING: BAM index file bio/m201/OUT_SAMPLE102/OUT.sam.bam.bai is older than BAM bio/m201/OUT_SAMPLE102/OUT.sam.bam
INFO  16:14:46,425 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  16:14:46,440 RMDTrackBuilder - Loading Tribble index from disk for file bio/mtoolbox/MToolBox/data/MITOMAP_HMTDB_known_indels_RCRS.vcf 
INFO  16:14:47,444 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Input files reads and reference have incompatible contigs: No overlapping contigs found.
##### ERROR   reads contigs = [chrM]
##### ERROR   reference contigs = [chrRCRS]
##### ERROR ------------------------------------------------------------------------------------------

Best, Jessica

domenico-simone commented 8 years ago

@leejae8 @robertopreste can we close this issue?

robertopreste commented 8 years ago

Yes, this issue should be fixed now.