pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
648 stars 170 forks source link

Crash when bootstrap option used and no matches found #186

Open jakewendt opened 5 years ago

jakewendt commented 5 years ago

I downloaded the latest binary for my linux server and created a simple reference with 1 sequence in it.

> kallisto version
kallisto, version 0.44.0

> kallisto index --index myref myreference.fasta

[build] loading fasta file myreference.fasta
[build] k-mer length: 31
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 136 contigs and contains 151302 k-mers 

I then align a large paired end query.

> kallisto quant --threads 10 --index myref --output-dir ./query_1.fastq.gz query_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 1
[index] number of k-mers: 147,811
[index] number of equivalence classes: 1
[quant] running in paired-end mode
[quant] will process pair 1: query_1.fastq.gz
                             query_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 27,227,094 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[quant] estimated average fragment length: 0
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds

Which finds no matches.

> cat ./abundance.tsv 
target_id   length  eff_length  est_counts  tpm
ref1    159378  159379  0   -nan

No big deal. However, if I add the bootstrap option ...

> kallisto quant -b 10 --threads 10 --index myref --output-dir ./ query_1.fastq.gz query_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 1
[index] number of k-mers: 147,811
[index] number of equivalence classes: 1
[quant] running in paired-end mode
[quant] will process pair 1: query_1.fastq.gz
                             query_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 27,227,094 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[quant] estimated average fragment length: 0
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
terminate called recursively
terminate called recursively
Aborted (core dumped)

> kallisto quant -b 10 --threads 10 --index myref --output-dir ./ query_1.fastq.gz query_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 1
[index] number of k-mers: 147,811
[index] number of equivalence classes: 1
[quant] running in paired-end mode
[quant] will process pair 1: query_1.fastq.gz
                             query_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 27,227,094 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[quant] estimated average fragment length: 0
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
terminate called after throwing an instance of 'std::domain_errorterminate called recursively
terminate called recursively
terminate called recursively
Aborted (core dumped)

> kallisto quant -b 10 --threads 10 --index myref --output-dir ./ query_1.fastq.gz query_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 1
[index] number of k-mers: 147,811
[index] number of equivalence classes: 1
[quant] running in paired-end mode
[quant] will process pair 1: query_1.fastq.gz
                             query_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 27,227,094 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[quant] estimated average fragment length: 0
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
terminate called after throwing an instance of 'std::domain_error'
terminate called recursively
terminate called recursively
  what():  nsamp must be -1 or >=1Aborted (core dumped)

... it crashes in slightly different ways. This has only happened when no matches are found and used the bootstrap option.

Am I doing something wrong? Is this a kallisto error or a library error?

jakewendt commented 5 years ago

I have also downloaded the github repository, compiled and duplicated this error. It is coming from src/Multinomial.hpp.

jakewendt commented 5 years ago

I commented out the only throw in the file on line 35, removing the crash with no apparent side effects.

What is the purpose of this throw?

suragnair commented 5 years ago

Your read length is probably smaller than 31, whereas the index is using 31-mers by default. Refer #76.