pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
648 stars 170 forks source link

is it the 'clipped off poly-A tail' and 'clipped off poly-A tail' right? #422

Open Bianzh1024 opened 8 months ago

Bianzh1024 commented 8 months ago

I download the transcriptome fasta from uscs, https://hgdownload.cse.ucsc.edu/goldenpath/mm9/bigZips/mrna.fa.gz . when I used kallisto index to bulid the index for kallisto, that had two warning:

[build] loading fasta file mrna.fa.gz [build] k-mer length: 31 [build] warning: clipped off poly-A tail (longer than 10) from 32479 target sequences [build] warning: replaced 196300 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... done. [build] building target de Bruijn graph ... done [build] creating equivalence classes ... done [build] target de Bruijn graph has 3001158 contigs and contains 263628489 k-mers

And it succeed. Why it showed the clipped and non-ACGUT ? The index is right?
I hope you can give me some advice. Thanks! @pmelsted

1111

Yenaled commented 8 months ago

Yes it’s fine