griffithlab / rnaseq_tutorial

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.
Other
1.33k stars 619 forks source link

sequences dropped from the index #34

Closed fklirono closed 5 years ago

fklirono commented 5 years ago

Hello,

kallisto (0.44.0) seems to be silently dropping sequences from the index.

Working example:

Is there a reason why some sequences are not indexed?

Code to reproduce example:

wget 'http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz' | gzip -d -c > human_hg19_circRNAs_putative_spliced_sequence.fa

sed -n '/^>/p' human_hg19_circRNAs_putative_spliced_sequence.fa |  wc -l 

kallisto index -i human_hg19_circRNAs_putative_spliced_sequence.fa.fai human_hg19_circRNAs_putative_spliced_sequence.fa

kallisto inspect human_hg19_circRNAs_putative_spliced_sequence.fa.fai
fklirono commented 5 years ago

sorry for opening this issue here. I should have opened it directly at the kallisto github....had not had enough coffee yet to wake up...