kingsfordgroup / sailfish

Rapid Mapping-based Isoform Quantification from RNA-Seq Reads
http://www.cs.cmu.edu/~ckingsf/software/sailfish
GNU General Public License v3.0
124 stars 45 forks source link

Indexing fails #88

Closed aminzia closed 8 years ago

aminzia commented 8 years ago

Hi There,

I am trying to build the index for gencode transcripts but all versions of sailfish give a failure error at the end. Here's the transcripts I get when I run the index:

sailfish index -t gencode.fa -o ./sailfish/gencode

writing log to ./sailfish/gencode/logs/sailfish_index.log [2016-01-20 19:06:03.058] [jointLog] [info] computeBiasFeatures( { [2016-01-20 19:06:03.058] [jointLog] [info] [gencode.fa] [2016-01-20 19:06:03.058] [jointLog] [info] , ./sailfish/gencode/bias_feats.txt, false, 96

readFile: gencode.fa, file gencode.fa: processed 200 transcripts (1) transcripts/s RapMap Indexer

[Step 1 of 4] : counting k-mers Elapsed time: 412.911s

Clipped poly-A tails from 0 transcripts Building rank-select dictionary and saving to disk done Elapsed time: 0.432678s Writing sequence data to file . . . done Elapsed time: 3.01218s Building suffix array . . . FAILURE: return code from sais() was -1

Could you please give mea clue as how I might resolve the issue?

Thank you Amin

rob-p commented 8 years ago

Hi @aminzia,

Could you provide a link to the exact file on which you're trying to build the index? The command usage looks right, but the error is coming from the call to the external library that builds the suffix array. That's a (well-tested) external piece of code, so I imagine something strange is going on here. Though I don't think it's the cause of the issue, you should also make sure you're using the latest version of Sailfish (https://github.com/kingsfordgroup/sailfish/releases/tag/v0.9.1). The output you give above is from at least 1-2 versions ago, because we've since stopped using the old bias correction methodology, and output of the form:

[2016-01-20 19:06:03.058] [jointLog] [info] computeBiasFeatures( {
[2016-01-20 19:06:03.058] [jointLog] [info] [gencode.fa]
[2016-01-20 19:06:03.058] [jointLog] [info] , ./sailfish/gencode/bias_feats.txt, false, 96

should no longer be generated.

rob-p commented 8 years ago

@aminzia --- any luck with this?

aminzia commented 8 years ago

I just used the latest binary version, SailfishBeta-0.9.1_CentOS5, and the indexing was successful. I am currently running "quant" to see if this binary works on our cluster. Will keep you posted.

Thanks a lot for the followup. Best, Amin

aminzia commented 8 years ago

The "quant" turned out to give segmentation error, any suggestion? Thanks so much.

./sailfish/SailfishBeta-0.9.1_CentOS5/bin/sailfish quant -i ./gencode -l ISF -1 M_1.fastq -2 M_2.fastq -o M.ISF.transcripts_quant -p 8

sailfish (quasi-mapping-based) v0.9.1

[ program ] => sailfish

[ command ] => quant

[ index ] => { ./gencode }

[ libType ] => { ISF }

[ mates1 ] => { M_1.fastq }

[ mates2 ] => { M_2.fastq }

[ output ] => { M_meth.ISF.transcripts_quant }

[ threads ] => { 8 }

Logs will be written to M_meth.ISF.transcripts_quant/logs [2016-01-22 10:19:31.459] [jointLog] [info] parsing read library format there is 1 lib Loading 64-bit quasi index[2016-01-22 10:19:31.543] [stderrLog] [info] Loading Suffix Array [2016-01-22 10:19:31.544] [stderrLog] [info] Loading Position Hash [2016-01-22 10:19:31.543] [jointLog] [info] Loading Quasi index [2016-01-22 10:27:03.129] [stderrLog] [info] Loading Transcript Info [2016-01-22 10:36:33.685] [stderrLog] [info] Loading Rank-Select Bit Array [2016-01-22 10:37:01.921] [stderrLog] [info] There were 297 set bits in the bit array [2016-01-22 10:37:14.490] [stderrLog] [info] Computing transcript lengths [2016-01-22 10:37:14.490] [stderrLog] [info] Waiting to finish loading hash Index contained 297 targets Loaded targets

[2016-01-22 18:26:08.374] [stderrLog] [info] Done loading index [2016-01-22 18:26:08.374] [jointLog] [info] done Segmentation fault

rob-p commented 8 years ago

Just out of curiosity, I noticed:

Loading 64-bit quasi index

and

There were 297 set bits in the bit array

both appear during index loading. Thus suggests that (1) the size of your reference ins > 2^31 nucleotides and (2) there are only 297 transcripts recognized in the reference. Did you by any chance index the genome rather than the transcriptome? Sailfish (like most transcript-level quantification tools) should be used with the transcript sequences rather than the genome sequence. So, for example, if you're using gencode and working in human, you'd use this file --- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.pc_transcripts.fa.gz --- to quantify protein-coding transcripts or this file ---ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.transcripts.fa.gz --- file to quantify all transcripts.

rob-p commented 8 years ago

Is there any update on this? Is the issue that the genome, rather than transcriptome, was being indexed?

aminzia commented 8 years ago

The transcripts are indexed successfully but the quantification still fails. I assume it's because of the fact that I am using the binary version which might not be quite compatible with our cluster and so far have not spent time to compile the source on our cluster. I will give you an update when I can actually compile it.

Thanks so much for following up. Best, Amin