pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
648 stars 170 forks source link

no reads pseudoalign when reads are the same length as transcripts in index or of length 3? #184

Open winni2k opened 5 years ago

winni2k commented 5 years ago

I have put a small example on [this gist].(https://gist.github.com/winni2k/64efa2e354a70a72d8a70a5ac373cc49)

When I run run.sh, I get the following output:

0 reads pseudoalign

[build] loading fasta file transcripts.fa
[build] k-mer length: 3
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 3 contigs and contains 3 k-mers 

[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 3
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds

0 reads pseudoalign in this case as well

[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 3
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single_v3.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds

8 reads pseudoalign

[build] loading fasta file transcripts_v2.fa
[build] k-mer length: 3
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 3 contigs and contains 5 k-mers 

[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 5
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 8 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds

Summary:

I don't understand why the reads don't pseudoalign in the first two cases. Is this a bug or a feature?

mfansler commented 5 years ago

I don't know the internals of what causes the issue, but it has something to do with how the fragment length is used to constrain possible alignments. You can use a fragment length of 1 to remove the constraint. I checked and this will result in all reads aligning in each case you provided.

Zepeng-Mu commented 3 years ago

I think k-mer length must be smaller than read length?