pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
649 stars 171 forks source link

lr-kallisto quant-tcc seg fault with bulk ONT #463

Open sbresnahan opened 2 days ago

sbresnahan commented 2 days ago

Version: kallisto 0.51.1

I'm following a workflow outlined in issue 456 for using lr-kallisto with bulk ONT. kallisto bus, bustools sort, and bustools count steps complete without errors. However, the kallisto quant-tcc step is being dumped by LSF with 554689 Segmentation fault shortly after processing sample/cell N.

I'm using a kallisto index with kmer-length=63 built from transcripts pulled from the GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta and gencode v45 gtf using gffread. An index built from these transcripts with kmer-length=31 have no issues with kallisto quant using short reads.

bound-to-love commented 2 days ago

Hi, Sean, since you are processing bulk, it should only print out processing sample/cell 0; is this the case? Can you please post the full output?

sbresnahan commented 1 day ago

If I run with --threads=1, it is indeed only processing sample/cell 0 before the seg fault:

[index] k-mer length: 63
[index] number of targets: 252,723
[index] number of k-mers: 157,178,936
[index] number of equivalence classes loaded from file: 327,292
[tcc] Parsing transcript-compatibility counts (TCC) file as a matrix file
[tcc] Matrix dimensions: 72 x 327,292
[quant] Running EM algorithm...
[   em] reading priors from file ONT
[quant] Processing sample/cell 0
/home/stbresnahan/.lsbatch/1727389319.16590285.shell: line 39: 55903 Segmentation fault     (core dumped) kallisto quant-tcc -t 1 --long -p ONT -f ${DIR_OUT}/flens.txt -i kallisto_index/gencode_v45 -e ${DIR_OUT}/count.ec.txt -o ${DIR_OUT}/quant-tcc ${DIR_OUT}/count.mtx

However, if I set --threads to anything other than 1 (in this case, 12), it is:

[index] k-mer length: 63
[index] number of targets: 252,723
[index] number of k-mers: 157,178,936
[index] number of equivalence classes loaded from file: 327,292
[tcc] Parsing transcript-compatibility counts (TCC) file as a matrix file
[tcc] Matrix dimensions: 72 x 327,292
[quant] Running EM algorithm...
[   em] reading priors from file ONT
[quant] Processing sample/cell 0quant] Processing sample/cell [quant] Processing sample/cell 2[quant] Processing sample/cell [quant] Processing sample/cell quant] Processing sample/cell 5
[quant] Processing sample/cell 3[quant] Processing sample/cell [quant] Processing sample/cell 6
[quant] Processing sample/cell 4
[quant] Processing sample/cell 77

[quant] Processing sample/cell 88
[[[

quant] Processing sample/cell 11
[quant] Processing sample/cell 9quant] Processing sample/cell [quant] Processing sample/cell 11uant] Processing sample/cell [quant] Processing sample/cell [quant] Processing sample/cell 1
0
0

/home/stbresnahan/.lsbatch/1727384386.16588742.shell: line 38: 3476442 Segmentation fault     (core dumped) kallisto quant-tcc -t 12 --long -p ONT -f ${DIR_OUT}/flens.txt -i kallisto_index/gencode_v45 -e ${DIR_OUT}/count.ec.txt -o ${DIR_OUT}/quant-tcc ${DIR_OUT}/count.mtx

This occurs regardless of whether I start the process with a single .fastq or multiple .fastq files.