Open amanpatel101 opened 6 years ago
@amanpatel101: The shortest possible FASTA record without ACGUT that results in kallisto
reporting > 0 non-ACGUT characters upon indexing this single transcript would likely help to understand and fix the issue. :wink:
I encountered similar problems. The reason for mine is that I used genome ref fasta but it should be transcipt ref fasta.
Kallisto seems to detect a large number of non-ACGUT characters when none exist. When I try to create an index using kallisto index, this is part of the output: [build] warning: replaced 5334758 non-ACGUT characters in the input sequence with pseudorandom nucleotides The counting kmers step also takes an exorbitantly long time.
I'm perplexed because comparably large indices have been created in a fraction of the time and with very few ambiguous characters. I have tracked an area of my database file where there is supposed to be one non-ACGUT character, and there certainly aren't any.
Any advice would be greatly appreciated. Thanks in advance!