pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
658 stars 172 forks source link

finding pseudoalignments for the reads ...Segmentation fault (core dumped) #105

Open aungthurhahein opened 8 years ago

aungthurhahein commented 8 years ago

When trying to get pesudobam file, it gives me the core-dump error. Machine is Ubuntu server with x86-64 architecture.

kallisto quant -i Trinity.fasta.kallisto_idx -l 590 -s 160.95 -o out --pseudobam --single ge50.fasta > out.sam
pmelsted commented 8 years ago

What is the error reported? We fixed an issue in v0.42.5 which could be the cause of this. Can you run this on the latest versio 0.42.5

aungthurhahein commented 8 years ago

Kallisto version is 0.42.4 and this is the error message:

finding pseudoalignments for the reads ...Segmentation fault (core dumped) 

I will download v0.42.5 and try again. I will get back to you with the outcome.

aungthurhahein commented 8 years ago

I tried with kallisto ver. 0.42.5 with the following command and the error still persists.

Command:

kallisto quant -l 605 -s 136 -i Trinity.fasta.kallisto_idx -o aln_out --pseudobam --single lib.fasta

Program halted with the following error:

@SQ     SN:c10356_g10356_i1     LN:1195
@PG     ID:kallisto     PN:kallisto     VN:0.42.5
Segmentation fault (core dumped)
pmelsted commented 8 years ago

In your case you seem to have only a single sequence in your index, can you confirm that this is what you expected?

Can you run kallisto quant without the pseudobam, I'm just trying to isolate whether there is a problem with the pseudobam or other parts.

can you also report what is written to stderr when you run your command as kallisto quant -l 605 -s 136 -i Trinity.fasta.kallisto_idx -o aln_out --pseudobam --single lib.fasta > lib.sam

aungthurhahein commented 8 years ago

The index file has more than one sequence. I just reported the end of the stdout.
I can run kallisto quant without generating pseudobam successfully.

This is the output of both stdout and stderr:

[quant] fragment length distribution is truncated gaussian with mean = 605, sd = 136
[index] k-mer length: 31
[index] number of targets: 10,357
[index] number of k-mers: 5,947,007
[index] number of equivalence classes: 24,998
[quant] running in single-end mode
[quant] will process file 1: /colossus/home/anuphap/EST/EST_lib_IDs/pm/slect_bytissues_pm_chula/PM82_wTempLibID_04092014.txt.PmTwI.seqID.fasta
[quant] finding pseudoalignments for the reads ...@HD   VN:1.0
@SQ     SN:c0_g0_i1     LN:216
@SQ     SN:c1_g1_i1     LN:374
@SQ     SN:c2_g2_i1     LN:197
...
@SQ     SN:c10354_g10354_i1     LN:594
@SQ     SN:c10355_g10355_i1     LN:682
@SQ     SN:c10356_g10356_i1     LN:1195
@PG     ID:kallisto     PN:kallisto     VN:0.42.5
Segmentation fault (core dumped)

Also, "core.xxxx" file is written inside the working directory.

pmelsted commented 8 years ago

The sequences you are aligning have the ending .fasta are they truly FASTA entries and not FASTQ. Because pseudoalignment outputs SAM files which are required to have a quality string kallisto (probably) fails because it has no quality string.

I'll have to check for this a bit more carefully when doing pseudoalignment.

kallisto never uses the quality values so you can supply a dummy value, essentially converting the FASTA file to a FASTQ files.

You can try this by just converting the first few sequences of the input file to FASTQ

aungthurhahein commented 8 years ago

Yes.I confirmed that .fasta file has no quality file. I didn't mention it before because don't expect that it can be the cause of the issue.

I will test with .fastq file format and report the outcome soon.

maubarsom commented 6 years ago

I also ran into a segfault when generating the pseudobam, but due to a slightly different problem . I was running kallisto using process substitution to deal with an interleaved paired end file e.g

kallisto quant -t 8 -i kallisto.idx -o my_sample --pseudobam <(seqtk seq -1 interleaved.fq) <(seqtk seq -2 interleaved.fq)

Kallisto runs perfectly fine without the --pseudobam flag, but it crashes if I request the pseudobam.

I figured the pseudobam needs re-reading the fastq files, so I tried doing the split beforehand and then the seg fault does not happen (runs fine).

Would be nice to add this to the docs at least :). A nice would have also would be support for interleaved paired end files :)

Evi-050 commented 4 years ago

Hello, I face this problem: " [ bam] writing pseudoalignments to BAM format .. Segmentation fault" and I have no idea how to fix it. I have smartseq.2 single reads, dual indexed ( this is who the fastq reads look like: @NB551291:160:H55CJBGXF:1:11101:12947:14932 1:N:0:TAAGGCGA+GCGATCTA GGCGTGTCCCGCGCGTGTGGGGGGAACCTCCGCGTCGGTGTTCCCCCGCCGGGTCCGCCCCCCGGGCCGCGGTTTT + AAAA/EAAAEEEA/EEAEEEAEE/E/EEEEEEAEA/EEEEEEEEEEEEEAEEEE/E/EAEEAEEE6AE/</EA/// )

I run this pipeline: [user@vm-129-49 mouse1.fastq_gz]$ kallisto quant -i /ad/vlachou/scRNAseq.2/kallisto_analysis/gencode.vM24.transcripts.idx --output-dir /ad/vlachou/scRNAseq.2/kallisto_analysis/kallisto_quant/gencode_indexed/mouse1 --pseudobam --genomebam --gtf /vlachou/scRNAseq.2/kallisto_analysis/gencode.vM24.annotation.gtf.gz --single -l 530 -s 150 -t 16 *fastq.gz

this is the outcome message: [quant] fragment length distribution is truncated gaussian with mean = 530, sd = 150 [index] k-mer length: 31 [index] number of targets: 142,552 [index] number of k-mers: 120,672,054

[quant] finding pseudoalignments for the reads ... done [quant] processed 482,819,438 reads, 208,880,499 reads pseudoaligned [ em] quantifying the abundances ... done [ em] the Expectation-Maximization algorithm ran for 1,273 rounds [ bam] writing pseudoalignments to BAM format .. Segmentation fault I tried the same with esnembl as reference but I get the same problem.

If anyone could help me out, it would be great! Thanks

kopardev commented 4 years ago

Any idea if this issue has been resolved yet. I am also getting something very similar:

[  bam] writing pseudoalignments to BAM format .. /spin1/swarm/kopardevn/M0tDGHNewa/cmd.10: line 1: 12564 Segmentation fault      ( kallisto quant -i mm10_M21 -o TreatmentB_S72 --bias --plaintext
--fusion --rf-stranded -t 56 --pseudobam --genomebam --gtf genes.gtf -c mm10.genome trim/TreatmentB_S72.R1.trim.fastq.gz trim/TreatmentB_S72.R2.trim.fastq.gz )
Evi-050 commented 4 years ago

So, personally, I went with STAR since I was not in a hurry, but someone in another post suggested going back to the older version that works. But frankly, I didn't try it. Also if I remember when I removed the "--pseudobam --genomebam --gtf genes.gtf" and run for example "kallisto quant -i index -o output --single -l 200 -s 20 file1.fastq.gz file2.fastq.gz file3.fastq.gz" it worked.

Very good luck!

redst4r commented 4 years ago

keeps happening to me too in kallisto 0.46.2:

[quant] finding pseudoalignments for the reads ...

[quant] done
[quant] processed 250,960,675 reads, 156,761,018 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,513 rounds
[  bam] writing pseudoalignments to BAM format .. [1]    2673 segmentation fault

works when removing the --genomebam flag, but I'd really like to get the bamfile out of this