mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
89 stars 37 forks source link

No annotation.csv found. #42

Open jich-MCTP opened 6 years ago

jich-MCTP commented 6 years ago

When running sample data HG00119, I meet the error "No annotation.csv found." .

The following is the conf file:

!/bin/bash

mtdb_fasta=chrM.fa hg19_fasta=hg19RCRS.fa mtdb=chrM humandb=hg19RCRS input_path=/ubuntu/test/ output_name=/ubuntu/test/ list=HG00119.lst input_type=fastq ref=RCRS

The following is the log file:

setting up MToolBox environment variables... ...done

setting up MToolBox variables in config file ... ...done

Check python version... (2.7 required) OK.

Checking files to be used in MToolBox execution...

Checking mapExome parameters... OK.

Checking assembleMTgenome parameters... OK.

Checking mt-classifier parameters... OK.

Input type is fastq. output files will be placed in /ubuntu/test/

EXECUTING READ MAPPING WITH MAPEXOME...

mapExome for sample SRR043366, files found: SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz Mapping onto mtDNA... /ubuntu/MToolBox/bin/gmap/bin/gsnap -D /ubuntu/MToolBox/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz > /ubuntu/test//OUT_SRR043366/outmt.sam 2> /ubuntu/test//OUT_SRR043366/logmt.txt Extracting FASTQ from SAM... Reading Results... Filtering reads... Outfile saved on /ubuntu/test//OUT_SRR043366/OUT.sam. Done.

SAM files post-processing...

SORTING OUT.sam FILES WITH PICARDTOOLS...

[Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/ubuntu/test/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Oct 11 20:10:41 UTC 2017] Executing as root@0164a315b55d on Linux 4.4.0-1022-aws amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11; Picard version: 1.98(1547) INFO 2017-10-11 20:10:41 SortSam Finished reading inputs, merging and writing to output now. [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SortSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=16252928 Success.

Skip Indel Realigner... Skipping Mark Duplicates... [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/ubuntu/test/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Oct 11 20:10:41 UTC 2017] Executing as root@0164a315b55d on Linux 4.4.0-1022-aws amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11; Picard version: 1.98(1547) [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=16252928

ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...

WARNING: values of tail < 5 are deprecated and will be replaced with 5

[mpileup] 1 samples in 1 input files

Set max per-file depth to 8000 ##### GENERATING VCF OUTPUT... Reference sequence used for VCF: RCRS ##### PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS... Haplogroup predictions based on RSRS Phylotree build 16 Unable to compute haplogroup. ExitYour best results file is mt_classification_best_results.csv Loading contig sequences from file SRR043366-contigs.fasta Parsing pathogenicity table... Parsing variability data... Parsing info about haplogroup-defining sites... Parsing info about haplogroup assignments... No annotation.csv found. Exit
jalwillcox commented 6 years ago

Did you ever get this figured out? I'm running into the same (or a similar) problem. It looks like it may be related to where GSNAP is called in MToolBox-master/MToolBox/mapExome.py. The log file for that call is .../OUT_SRR043366/logmt.txt and (for me) looks like:


GSNAP version 2015-12-31 called with args: /home/jaw61/bin/MToolBox-master/bin/gmap/bin/gsnap -D /home/jaw61/bin/MToolBox-master/gmapdb/chrM/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads are DNA-Seq (genomic)
Checking compiler assumptions for popcnt: 6B8B4567 __builtin_clz=1 __builtin_ctz=0 __builtin_popcount=17
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Finished checking compiler assumptions
Allocating memory for compressed genome (oligos)...Attached existing memory for /home/jaw61/bin/MToolBox-master/gmapdb/chrM//chrM.genomecomp...done (6,216 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...Attached existing memory for /home/jaw61/bin/MToolBox-master/gmapdb/chrM//chrM.genomebits128...done (6,240 bytes, 0.00 sec)
No suffix array for genome
Cannot find genomic index files in either current or old format.  Looking for files containing ref

That's as far as I've figured it out, though... I'm not very familiar with GSNAP, so it's been a little tricky to troubleshoot. Hopefully this info helps, or narrows down the issue some!

gh-cweiss commented 2 years ago

I'm here in 2021 and running into this same issue. I've been wrangling with this tool all day and am really bummed I can't use it.

jalwillcox commented 2 years ago

It's been a while since I used this, but at some point I had determined that the problem in my case was b/c I had a "." in the file name (e.g. "sample.1.fastq") and it was as simple as swapping it out for an underscore (e.g. "sample_1.fastq"). Maybe that's the issue? Sorry I didn't post this earlier - I hope that helps!

gh-cweiss commented 2 years ago

I solved the issue. In my case, it was because I had an issue with the ncursesw library during installation. As a result, samtools did not install properly. My guess is that it did not index the .fa genomes properly as a result during installation. I fixed the ncursesw issue (reconfigured samtools/bin ./configure --without-curses) and reinstalled samtools with ./install.sh -i samtools, but that didn't fix the issue of no indices for the reference genomes. I went in and manually indexed the 4 reference genomes within MToolBox/genome_fasta with samtools faidx. Then everything worked. I hope this can help someone in the future :)

SangBeom-Bang commented 20 hours ago

I recently used this package for an analysis. Initially, I ran it on a Windows notebook using a virtual Ubuntu environment. During the process, I encountered an error stating that the annotation.csv file could not be found, even though the previous steps, including the VCF and fasta files, had been completed successfully. To troubleshoot the issue, I tried several solutions. Interestingly, when I ran the same code on an Ubuntu desktop, it worked perfectly. I hope this helps others who encounter a similar issue.