Closed fklirono closed 5 years ago
There are 58 sequences that are empty
seqtk comp human_hg19_circRNAs_putative_spliced_sequence.fa.gz | awk '$2==0 {print NR}' | wc -l
The first one is on sequence number 92510. kallisto index stops reading through the fasta file once it encounters an empty sequence, but it should just skip it. I'll fix this in the next release
If you remove the empty sequences kallisto will index the rest. Note that if you are working with circular rna you might want to circularize the fasta file based on the kmer length by adding the first k-1 bases of the fasta sequence to the end. That way kallisto will find k-mers that span the junction as well.
Thanks for the very fast response! I also noticed the empty sequences just now when I started looking into this.
Thanks also for the suggestion to circularize the sequences. I am experimenting with kallisto and Oxford nanopore long reads for circRNAs. Any more suggestions would be very welcome!
Hello,
kallisto (0.44.0) seems to be silently dropping sequences from the index.
Working example:
Is there a reason why some sequences are not indexed?
Code to reproduce example: