Open jich-MCTP opened 7 years ago
Did you ever get this figured out? I'm running into the same (or a similar) problem. It looks like it may be related to where GSNAP is called in MToolBox-master/MToolBox/mapExome.py. The log file for that call is .../OUT_SRR043366/logmt.txt and (for me) looks like:
GSNAP version 2015-12-31 called with args: /home/jaw61/bin/MToolBox-master/bin/gmap/bin/gsnap -D /home/jaw61/bin/MToolBox-master/gmapdb/chrM/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads are DNA-Seq (genomic)
Checking compiler assumptions for popcnt: 6B8B4567 __builtin_clz=1 __builtin_ctz=0 __builtin_popcount=17
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Finished checking compiler assumptions
Allocating memory for compressed genome (oligos)...Attached existing memory for /home/jaw61/bin/MToolBox-master/gmapdb/chrM//chrM.genomecomp...done (6,216 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...Attached existing memory for /home/jaw61/bin/MToolBox-master/gmapdb/chrM//chrM.genomebits128...done (6,240 bytes, 0.00 sec)
No suffix array for genome
Cannot find genomic index files in either current or old format. Looking for files containing ref
That's as far as I've figured it out, though... I'm not very familiar with GSNAP, so it's been a little tricky to troubleshoot. Hopefully this info helps, or narrows down the issue some!
I'm here in 2021 and running into this same issue. I've been wrangling with this tool all day and am really bummed I can't use it.
It's been a while since I used this, but at some point I had determined that the problem in my case was b/c I had a "." in the file name (e.g. "sample.1.fastq") and it was as simple as swapping it out for an underscore (e.g. "sample_1.fastq"). Maybe that's the issue? Sorry I didn't post this earlier - I hope that helps!
I solved the issue. In my case, it was because I had an issue with the ncursesw library during installation. As a result, samtools did not install properly. My guess is that it did not index the .fa genomes properly as a result during installation. I fixed the ncursesw issue (reconfigured samtools/bin ./configure --without-curses) and reinstalled samtools with ./install.sh -i samtools, but that didn't fix the issue of no indices for the reference genomes. I went in and manually indexed the 4 reference genomes within MToolBox/genome_fasta with samtools faidx. Then everything worked. I hope this can help someone in the future :)
I recently used this package for an analysis. Initially, I ran it on a Windows notebook using a virtual Ubuntu environment. During the process, I encountered an error stating that the annotation.csv file could not be found, even though the previous steps, including the VCF and fasta files, had been completed successfully. To troubleshoot the issue, I tried several solutions. Interestingly, when I ran the same code on an Ubuntu desktop, it worked perfectly. I hope this helps others who encounter a similar issue.
When running sample data HG00119, I meet the error "No annotation.csv found." .
The following is the conf file:
!/bin/bash
mtdb_fasta=chrM.fa hg19_fasta=hg19RCRS.fa mtdb=chrM humandb=hg19RCRS input_path=/ubuntu/test/ output_name=/ubuntu/test/ list=HG00119.lst input_type=fastq ref=RCRS
The following is the log file:
setting up MToolBox environment variables... ...done
setting up MToolBox variables in config file ... ...done
Check python version... (2.7 required) OK.
Checking files to be used in MToolBox execution...
Checking mapExome parameters... OK.
Checking assembleMTgenome parameters... OK.
Checking mt-classifier parameters... OK.
Input type is fastq. output files will be placed in /ubuntu/test/
EXECUTING READ MAPPING WITH MAPEXOME...
mapExome for sample SRR043366, files found: SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz Mapping onto mtDNA... /ubuntu/MToolBox/bin/gmap/bin/gsnap -D /ubuntu/MToolBox/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz > /ubuntu/test//OUT_SRR043366/outmt.sam 2> /ubuntu/test//OUT_SRR043366/logmt.txt Extracting FASTQ from SAM... Reading Results... Filtering reads... Outfile saved on /ubuntu/test//OUT_SRR043366/OUT.sam. Done.
SAM files post-processing...
SORTING OUT.sam FILES WITH PICARDTOOLS...
[Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/ubuntu/test/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Oct 11 20:10:41 UTC 2017] Executing as root@0164a315b55d on Linux 4.4.0-1022-aws amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11; Picard version: 1.98(1547) INFO 2017-10-11 20:10:41 SortSam Finished reading inputs, merging and writing to output now. [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SortSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=16252928 Success.
Skip Indel Realigner... Skipping Mark Duplicates... [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/ubuntu/test/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Oct 11 20:10:41 UTC 2017] Executing as root@0164a315b55d on Linux 4.4.0-1022-aws amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11; Picard version: 1.98(1547) [Wed Oct 11 20:10:41 UTC 2017] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=16252928
ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...
WARNING: values of tail < 5 are deprecated and will be replaced with 5
[mpileup] 1 samples in 1 input files