pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
656 stars 172 forks source link

kallisto uses one thread instead of 14 assigned for it due to index building from genome building MPHF and creating equivalence classes steps. #460

Closed warp-felinidae closed 2 months ago

warp-felinidae commented 2 months ago

Hello!

kallisto 0.51.0 uses only one thread instead of 14 assigned for it. Is it OK or not? CPU is Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, RAM 256 Gb. Fasta file is approximately 6.5 Gb size.

Beginning photo_2024-09-12_17-50-10 Now image nohup /datastore/01/m_matveeva/kallisto/build/src/kallisto index -i sce_genome.idx -t 14 Secale_cereale.Rye_Lo7_2018_v1p1p1.dna.toplevel.fa

Output: [build] loading fasta file Secale_cereale.Rye_Lo7_2018_v1p1p1.dna.toplevel.fa [build] k-mer length: 31 [build] warning: replaced 155746534 non-ACGUT characters in the input sequence with pseudorandom nucleotides KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2) KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2) KmerStream::KmerStream(): Finished CompactedDBG::build(): Estimated number of k-mers occurring at least once: 2313881712 CompactedDBG::build(): Estimated number of minimizer occurring at least once: 458715777 CompactedDBG::filter(): Processed 6735226869 k-mers in 8 reads CompactedDBG::filter(): Found 2306160606 unique k-mers CompactedDBG::filter(): Number of blocks in Bloom filter is 15817552 CompactedDBG::construct(): Extract approximate unitigs (1/2) CompactedDBG::construct(): Extract approximate unitigs (2/2) CompactedDBG::construct(): Closed all input files

CompactedDBG::construct(): Splitting unitigs (1/2)

CompactedDBG::construct(): Splitting unitigs (2/2) CompactedDBG::construct(): Before split: 157113240 unitigs CompactedDBG::construct(): After split (1/1): 157113240 unitigs CompactedDBG::construct(): Unitigs split: 0 CompactedDBG::construct(): Unitigs deleted: 0

CompactedDBG::construct(): Joining unitigs CompactedDBG::construct(): After join: 154889278 unitigs CompactedDBG::construct(): Joined 2223962 unitigs [build] building MPHF [build] creating equivalence classes ...

Yenaled commented 2 months ago

Yes, certain steps of building the index cannot be parallelized and thus only use one thread.