Closed asifzubair closed 8 years ago
Dear Asif,
could you please check if the --gunzip
option is supported by your gsnap version, by doing
gsnap --help | grep gunzip
and let me know if you get any result? It looks quite odd that the 2013-10-28 version does not support it, since I checked few gsnap versions ( the 2013-09-11, the 2013-11-27 which is newer than the one you're using and the 2015-09-29) and they all support it.
Thanks Claudia
Hi Claudia,
I think you are right - something broke in this version.
root@9ddf6ed30511:/# gsnap
GSNAP version 2013-10-28 called with args: gsnap
Checking compiler assumptions for popcnt: 6B8B4567 clz=1 clz=0 popcount=17
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: 103 -58 max=103
Need to specify the -d flag. For usage, run 'gsnap --help'
root@9ddf6ed30511:/# gsnap --help | grep gunzip
GSNAP version 2013-10-28 called with args: gsnap --help
root@9ddf6ed30511:/#
So, it really doesn't support compressed indexes. I agree that this is a little weird.
gsnap
archives but I can't find this particular verison. build_gsnap_index.sh
, but it took a LONG time and yet it wasn't completed. Also, memory requirements were huge. Is this normal for gsnap
? I was using the latest version.chrM
but in the sample VCF that you recently generated it appears as chrMT
and in the reference the fasta sequence is simply chrRSRS
.MToolBox
take care of ploidy for mtDNA ? I think the major problem with using callers like GATK
and samtools
is that they assume a diploid genome. I suppose the paper accompanying MToolBox
will answer this question, but I was wondering if you knew a quick answer. As always, thank you so much for your help.
Best !
a.
Dear Asif,
1) I would suggest to download the latest version of GSNAP which supports --gunzip
option. Please have a look at this link to get the latest GSNAP version: http://research-pub.gene.com/gmap/
or here for previous ones: http://research-pub.gene.com/gmap/archive.html
2) Yes, it will take few hours to index the human genome (you need to be patient :-P). You can reduce the memory requirements by setting a lower kmer size (which by default is 15, so you can reduce to 12) has explained here: http://research-pub.gene.com/gmap/src/README
3) It doesn't matter if your bam files were already aligned using another reference mitochondrial sequence. They will be converted to fastq and re-mapped using the chrRCRS.fa. In the VCF file you'll always see chrMT in the CHROM field, since the chromosome name has been hardcoded in the script to generate the VCF, either you use chrRCRS or chrRSRS as reference.
4) Yes, MToolBox handles ploidy and outputs a VCF file (version 4.0), enhanced with heteroplasmy. The script that computes the heteroplasmic ratio is the mtVariantCaller.py. More datails about this can be found in the Supplementary Data of the MToolBox publication (http://www.ncbi.nlm.nih.gov/pubmed/25028726)
Anyway, since many users are experiencing problems with gsnap indexing, we are going to release a mini-tutorial to generate the MToolBox GSNAP databases and also make some substantial changes to the MToolBox code to facilitate the setup of variables and executables used by the MToolBox pipeline.
Hope this helps. Best, Claudia
Hi @clody23 :
I think I found an answer why the gunzip option is missing.
From the gsnap
github repo
GSNAP also has the ability to deal with files compressed with gzip, if
the configure script at compile time can find a zlib library in your
system (see Note 3 in the section above about building and installing
GMAP and GSNAP).
So, I guess I'll have to install zlib
first. I'll close this thread after I have tested that.
Thank you for your clarifications on my other queries ! :)
Best,
a.
So, I installed zlib
before I installed gsnap
.
on ubuntu, this can be accomplished by this command:
apt-get update && apt-get install zlib1g-dev
I tested the gunzip
option ...
root@5878b656bb26:/tmp/gmap-2013-09-30# gsnap --help | grep gunzip
GSNAP version 2013-09-30 called with args: gsnap --help
--gunzip Uncompress gzipped input files
... and it worked.
Thank you for the help !
a.
Hi,
I'm using only a slightly newer version of
gsnap
than the one with which the indexes were made - as evidenced here:However, when I look in the
logmt.txt
I found the following error message:It seems that the option
--gunzip
is not recognized. I tried running the same command in the log file without the--gunzip
option and it worked. Of course, I decompressed the indexes as well.I think this option was removed in later versions of
gsnap
and this should reflect in the MToolBox pipeline.Please help.