pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
655 stars 172 forks source link

Quantifying abundances: Segmentation fault on all kallisto versions above 0.48.0 #465

Open CodingKaiser opened 5 days ago

CodingKaiser commented 5 days ago

Hi there

We are having trouble using the newer versions of Kallisto, mainly all versions >=0.50.0 with the new index. Here is what I'm doing:

$ ./kallisto_0.50.1.sif kallisto index -i transcripts.idx transcripts.fa
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

[build] loading fasta file transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 643 target sequences
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Finished
CompactedDBG::build(): Estimated number of k-mers occurring at least once: 100584348
CompactedDBG::build(): Estimated number of minimizer occurring at least once: 24727070
CompactedDBG::filter(): Processed 208205791 k-mers in 106256 reads
CompactedDBG::filter(): Found 100421622 unique k-mers
CompactedDBG::filter(): Number of blocks in Bloom filter is 687590
CompactedDBG::construct(): Extract approximate unitigs (1/2)
CompactedDBG::construct(): Extract approximate unitigs (2/2)
CompactedDBG::construct(): Closed all input files

CompactedDBG::construct(): Splitting unitigs (1/2)

CompactedDBG::construct(): Splitting unitigs (2/2)
CompactedDBG::construct(): Before split: 529414 unitigs
CompactedDBG::construct(): After split (1/1): 529414 unitigs
CompactedDBG::construct(): Unitigs split: 1262
CompactedDBG::construct(): Unitigs deleted: 0

CompactedDBG::construct(): Joining unitigs
CompactedDBG::construct(): After join: 479225 unitigs
CompactedDBG::construct(): Joined 50347 unitigs
[build] building MPHF
[build] creating equivalence classes ...
[build] target de Bruijn graph has k-mer length 31 and minimizer length 23
[build] target de Bruijn graph has 479225 contigs and contains 100509978 k-mers

This is then followed by a quantification step:

$ ./kallisto_0.50.1.sif kallisto quant -i transcripts.idx -o ./foo -t 1 --bias --bootstrap-samples 10 --seed 42 --rf-stranded trimmed_R1.fastq.gz trimmed_R2.fastq.gz
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 106,256
[index] number of k-mers: 100,509,978
[quant] running in paired-end mode
[quant] will process pair 1: trimmed_R1.fastq.gz
                             trimmed_R2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] learning parameters for sequence specific bias
[quant] processed 91,234 reads, 77,892 reads pseudoaligned
[quant] estimated average fragment length: 170.219
[   em] quantifying the abundances ...Segmentation fault

I have excluded the possibility that it is simply the containers which are problematic, as installing from source results in the same error.

Running the same sequence of commands for version 0.48.0 works just fine.

We are running debian12

Yenaled commented 5 days ago

The bias option is no longer supported

CodingKaiser commented 4 days ago

Thanks a lot, it works now!

So --bias is no longer supported, but still able to be passed to kallisto? I noticed providing any other non-supported argument, such as --foo results in at least a "quant: unrecognized option '--foo'" warning, but passing --bias is completely silent.

Yenaled commented 4 days ago

It's no longer displayed in the command-line help menu. However, the old bias implementation is still kept in the codebase just in case we want to re-support it in the future.

CodingKaiser commented 4 days ago

I understand. I also now see the corresponding points in the release notes of v0.50.0. Still I feel it would be helpful to display a warning message when the parameter is erroneously provided just to avoid confusion for people upgrading from older versions.