pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
649 stars 171 forks source link

building Index aborted #426

Closed ghost closed 7 months ago

ghost commented 7 months ago

Hello,

I am using kallisto 0.46.2 in conda env.

Code: kallisto index -i index /fasta/Homo_sapiens.GRCh38.dna.primary_assembly.fa

Error: [build] loading fasta file /fasta/Homo_sapiens.GRCh38.dna.primary_assembly.fa [build] k-mer length: 31 [build] warning: replaced 153901651 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... Aborted

Any help is appreciated.

Yenaled commented 7 months ago

You need to index a transcriptome FASTA, not a genome FASTA. kallisto is a transcriptome-mapper, not a genome-mapper.

In fact, Ensembl already distributes transcriptome FASTAs if you don't want/need to build one yourself: https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

ghost commented 7 months ago

Thank you for the quick response and pointing the issue.

using transcriptome FASTA worked.