pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
654 stars 172 forks source link

Segmentation fault writing pseudoalignments to BAM #390

Closed ybao3 closed 1 year ago

ybao3 commented 1 year ago

Problem: visualize the pseudoalignments: kallisto quant -i transcripts.idx -b 30 -o kallisto_out --genomebam --gtf transcripts.gtf.gz --chromosomes chrom.txt reads_1.fastq.gz reads_2.fastq.gz

Output:

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 14
[index] number of k-mers: 22,118
Warning: 13 transcripts were defined in GTF file, but not in the index
[quant] running in paired-end mode
[quant] will process pair 1: reads_1.fastq.gz
                             reads_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 10,000 reads, 9,413 reads pseudoaligned
[quant] estimated average fragment length: 178.02
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
[bstrp] running EM for the bootstrap: 30
[  bam] writing pseudoalignments to BAM format .. [1]    76610 segmentation fault  /kallisto/build/src/kallisto quant -i  -b 30

Tests: https://pachterlab.github.io/kallisto/starting Files: https://github.com/pachterlab/kallisto/tree/master/test

on m1 Mac, dependencies installed via homebrew Build from source (last night), with all features enables:

cmake .. -DZLIBNG=ON -DUSE_BAM=ON -DBUILD_FUNCTESTING=ON -DUSE_HDF5=ON

Previously tried to install Kallisto via homebrew, works but BAM disabled. Also tried install via conda: conda install -c merv kallisto, gives segfault. That is 0.46.2 so I cloned 0.50 and build from source. However still gives me this error. I saw it has been fixed on 0.48.

Any suggestions?

indexing looks fine.



> kallisto index -i transcripts.idx transcripts.fasta.gz

Output:

[build] loading fasta file transcripts.fasta.gz
[build] k-mer length: 31
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Start computing k-mer cardinality estimations (1/2)
KmerStream::KmerStream(): Finished
CompactedDBG::build(): Estimated number of k-mers occurring at least once: 22338
CompactedDBG::build(): Estimated number of minimizer occurring at least once: 5522
CompactedDBG::filter(): Processed 28144 k-mers in 14 reads
CompactedDBG::filter(): Found 22097 unique k-mers
CompactedDBG::filter(): Number of blocks in Bloom filter is 154
CompactedDBG::construct(): Extract approximate unitigs (1/2)
CompactedDBG::construct(): Extract approximate unitigs (2/2)
CompactedDBG::construct(): Closed all input files

CompactedDBG::construct(): Splitting unitigs (1/2)

CompactedDBG::construct(): Splitting unitigs (2/2)
CompactedDBG::construct(): Before split: 25 unitigs
CompactedDBG::construct(): After split (1/1): 25 unitigs
CompactedDBG::construct(): Unitigs split: 0
CompactedDBG::construct(): Unitigs deleted: 0

CompactedDBG::construct(): Joining unitigs
CompactedDBG::construct(): After join: 21 unitigs
CompactedDBG::construct(): Joined 4 unitigs
[build] building MPHF
[build] creating equivalence classes ... 
[build] target de Bruijn graph has k-mer length 31 and minimizer length 23
[build] target de Bruijn graph has 21 contigs and contains 22118 k-mers

kallisto quant -i transcripts.idx -o output -b 100 reads_1.fastq.gz reads_2.fastq.gz

Output:

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 14
[index] number of k-mers: 22,118
[quant] running in paired-end mode
[quant] will process pair 1: reads_1.fastq.gz
                             reads_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 10,000 reads, 9,413 reads pseudoaligned
[quant] estimated average fragment length: 178.02
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
[bstrp] running EM for the bootstrap: 100
ybao3 commented 1 year ago

https://github.com/pachterlab/kallisto/releases/tag/v0.50.0

End of support for existing bulk RNAseq features

--bias, --fusion, --genomebam, and --pseudobam in kallisto quant and kallisto bus are no longer supported -- users should use v0.48.0 for use of these features.

Is this the reason?

Yenaled commented 1 year ago

Correct, pseudobam is no longer supported in the 0.50 series of releases.

Please use the 0.48 series for pseudobams.

ybao3 commented 1 year ago

worked on 0.48. Thanks!