mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
90 stars 38 forks source link

invalid option for gsnap #5

Closed asifzubair closed 8 years ago

asifzubair commented 8 years ago

Hi,

I'm using only a slightly newer version of gsnap than the one with which the indexes were made - as evidenced here:

root@e6ec59c14514:/MToolBox/test# gsnap --version
GSNAP version 2013-10-28 called with args: gsnap --version

GSNAP: Genomic Short Nucleotide Alignment Program
Part of GMAP package, version 2013-10-28
Build target: x86_64-unknown-linux-gnu
Features: pthreads enabled, no zlib, mmap available, littleendian, sigaction available, 64 bits available
Builtin functions: clz ctz popcount
SIMD functions: MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX
Sizes: off_t (8), size_t (8), unsigned int (4), long int (8)
Default gmap directory (compiled): /usr/local/share
Default gmap directory (environment): /usr/local/share
Maximum read length: 250
Thomas D. Wu, Genentech, Inc.
Contact: twu@gene.com

However, when I look in the logmt.txt I found the following error message:

GSNAP version 2013-10-28 called with args: /usr/local/bin/gsnap -D /usr/local/share/gmapdb/ --gunzip -d chrRSRS -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 simulation100X.R1.fastq simulation100X.R2.fastq
/usr/local/bin/gsnap: unrecognized option '--gunzip'
For usage, run 'gsnap --help'

It seems that the option --gunzip is not recognized. I tried running the same command in the log file without the --gunzip option and it worked. Of course, I decompressed the indexes as well.

I think this option was removed in later versions of gsnap and this should reflect in the MToolBox pipeline.

Please help.

clody23 commented 8 years ago

Dear Asif, could you please check if the --gunzip option is supported by your gsnap version, by doing

gsnap --help | grep gunzip

and let me know if you get any result? It looks quite odd that the 2013-10-28 version does not support it, since I checked few gsnap versions ( the 2013-09-11, the 2013-11-27 which is newer than the one you're using and the 2015-09-29) and they all support it.

Thanks Claudia

asifzubair commented 8 years ago

Hi Claudia,

I think you are right - something broke in this version.

root@9ddf6ed30511:/# gsnap
GSNAP version 2013-10-28 called with args: gsnap
Checking compiler assumptions for popcnt: 6B8B4567 clz=1 clz=0 popcount=17 
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: 103 -58 max=103
Need to specify the -d flag.  For usage, run 'gsnap --help'
root@9ddf6ed30511:/# gsnap --help | grep gunzip
GSNAP version 2013-10-28 called with args: gsnap --help
root@9ddf6ed30511:/# 

So, it really doesn't support compressed indexes. I agree that this is a little weird.

As always, thank you so much for your help.

Best !

a.

clody23 commented 8 years ago

Dear Asif, 1) I would suggest to download the latest version of GSNAP which supports --gunzip option. Please have a look at this link to get the latest GSNAP version: http://research-pub.gene.com/gmap/ or here for previous ones: http://research-pub.gene.com/gmap/archive.html 2) Yes, it will take few hours to index the human genome (you need to be patient :-P). You can reduce the memory requirements by setting a lower kmer size (which by default is 15, so you can reduce to 12) has explained here: http://research-pub.gene.com/gmap/src/README 3) It doesn't matter if your bam files were already aligned using another reference mitochondrial sequence. They will be converted to fastq and re-mapped using the chrRCRS.fa. In the VCF file you'll always see chrMT in the CHROM field, since the chromosome name has been hardcoded in the script to generate the VCF, either you use chrRCRS or chrRSRS as reference.
4) Yes, MToolBox handles ploidy and outputs a VCF file (version 4.0), enhanced with heteroplasmy. The script that computes the heteroplasmic ratio is the mtVariantCaller.py. More datails about this can be found in the Supplementary Data of the MToolBox publication (http://www.ncbi.nlm.nih.gov/pubmed/25028726)

Anyway, since many users are experiencing problems with gsnap indexing, we are going to release a mini-tutorial to generate the MToolBox GSNAP databases and also make some substantial changes to the MToolBox code to facilitate the setup of variables and executables used by the MToolBox pipeline.

Hope this helps. Best, Claudia

asifzubair commented 8 years ago

Hi @clody23 :

I think I found an answer why the gunzip option is missing.

From the gsnap github repo

GSNAP also has the ability to deal with files compressed with gzip, if
the configure script at compile time can find a zlib library in your
system (see Note 3 in the section above about building and installing
GMAP and GSNAP).

So, I guess I'll have to install zlib first. I'll close this thread after I have tested that.

Thank you for your clarifications on my other queries ! :)

Best,

a.

asifzubair commented 8 years ago

So, I installed zlib before I installed gsnap.

on ubuntu, this can be accomplished by this command:

apt-get update && apt-get install zlib1g-dev

I tested the gunzip option ...

root@5878b656bb26:/tmp/gmap-2013-09-30# gsnap --help | grep gunzip
GSNAP version 2013-09-30 called with args: gsnap --help
  --gunzip                       Uncompress gzipped input files

... and it worked.

Thank you for the help !

a.