mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
90 stars 38 forks source link

No annotation.csv found #77

Closed Nikeeta-C closed 5 years ago

Nikeeta-C commented 5 years ago

When running sample data provided in test folder getting error 'No annotation.csv found'

Error log is as follows:

setting up MToolBox environment variables... ...done

setting up MToolBox variables in config file ... ...done

test_try will be used as vcf file name...

Check python version... (2.7 required) OK.

Checking files to be used in MToolBox execution...

Checking mapExome parameters... OK.

Checking assembleMTgenome parameters... OK.

Checking mt-classifier parameters... OK.

Input type is fastq. output files will be placed in /path/MToolBox/test/sim_data/

EXECUTING READ MAPPING WITH MAPEXOME...

mapExome for sample simulation100X, files found: simulation100X.R1.fastq simulation100X.R2.fastq Mapping onto mtDNA... /path/MToolBox/bin/gmap/bin/gsnap -D /path/MToolBox/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 simulation100X.R1.fastq simulation100X.R2.fastq > /path/MToolBox/test/sim_data//OUT_simulation100X/outmt.sam 2> /path/MToolBox/test/sim_data//OUT_simulation100X/logmt.txt Extracting FASTQ from SAM... Mapping onto complete human genome...single reads Mapping onto complete human genome...pair reads Reading Results... Filtering reads... Outfile saved on /path/MToolBox/test/sim_data//OUT_simulation100X/OUT.sam. Done.

SAM files post-processing...

SORTING OUT.sam FILES WITH PICARDTOOLS...

net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/path/MToolBox/test/sim_data/OUT_simulation100X/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false Runtime.totalMemory()=214958080 Success.

Skip Indel Realigner... Skipping Mark Duplicates... net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/path/MToolBox/test/sim_data/OUT_simulation100X/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false

ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...

WARNING: values of tail < 5 are deprecated and will be replaced with 5

[mpileup] 1 samples in 1 input files

Set max per-file depth to 8000 ##### GENERATING VCF OUTPUT... Reference sequence used for VCF: RCRS ##### PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS... Haplogroup predictions based on RSRS Phylotree build 17 Your best results file is mt_classification_best_results.csv Loading contig sequences from file simulation100X-contigs.fasta Loaded 1 contig sequences Aligning Contigs to mtDNA reference genome... Sequence haplogroup assignment Classification according to tree: /path/MToolBox/data/phylotree_r17.pickle genome_state is incomplete OrderedDict() ==================== ------------------------------ Contig alignment to MHCS and rCRS Unable to compute haplogroup. ExitParsing pathogenicity table... Parsing variability data... Parsing info about haplogroup-defining sites... Parsing info about haplogroup assignments... No annotation.csv found. Exit tried using 'git pull' for getting updated files as discussed in issue #28 still getting same error
domenico-simone commented 5 years ago

Hi,

Can you please try to run the test on the real sample as detailed in https://github.com/mitoNGS/MToolBox/blob/master/test/HG00119_example/run_mtoolbox_on_test_file.md ? The simulation data are computed from the RSRS reference sequences so actually the haplogroup prediction is not working on them, thus affecting the downstream steps.

Thanks,

Domenico