hall-lab / speedseq

A flexible framework for rapid genome analysis and interpretation
MIT License
311 stars 116 forks source link

sambamba-view: unable to write to stream #123

Open pmonnahan opened 6 years ago

pmonnahan commented 6 years ago

Hi,

I am attempting to utilize speedseq as part of the lumpy express pipeline, but the program is crashing during speedseq align. It seems at some point during bwa mem, sambamba-view throws at error, 'unable to write to stream'.

The input data are simulated paired reads from SInC simulator and are of modest size (two gzipped fastqs, each 6.3 Gb). The reads were simulated for the first chromosome of maize. I requested 24 cores and a total of 62 Gb of RAM.

samblaster: Version 0.1.22 samblaster: Inputting from stdin samblaster: Outputting to stdout samblaster: Opening /panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/b73aln/IND81/IND81/disc_pipe for write. samblaster: Opening /panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/b73aln/IND81/IND81/spl_pipe for write. [M::main_mem] read 2400000 sequences (240000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 292625, 7, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (464, 498, 532) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (328, 668) [M::mem_pestat] mean and std.dev: (498.05, 49.78) [M::mem_pestat] low and high boundaries for proper pairs: (260, 736) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 2400000 reads in 2259.071 CPU sec, 95.935 real sec samblaster: Loaded 267 header sequence entries.

. . .

[M::main_mem] read 2400000 sequences (240000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 294094, 5, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (464, 498, 532) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (328, 668) [M::mem_pestat] mean and std.dev: (498.21, 49.81) [M::mem_pestat] low and high boundaries for proper pairs: (260, 736) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 2400000 reads in 2244.084 CPU sec, 94.828 real sec sambamba-view: unable to write to stream samblaster: Unable to write to output file.

[fputs] Broken pipe /bin/bash: line 1: 15137 Exit 1 /panfs/roc/groups/14/hirschc1/pmonnaha/software/speedseq//bin/bwa mem -t 24 -R '@RG\tID:IND81\tSM:IND81\tLB:IND81' /home/hirschc1/pmonnaha/misc-files/Zea_mays.AGPv4.dna.toplevel.fa /panfs/roc/scratch/pmonnaha/Maize/B73_reads/B73.chr1.8.fa_allele_1_S_0.0020_I_0.0001_C_2.00_1000_150000.fa_1_300_50_100.0_100.fq.convertReadNames.fq.gz /panfs/roc/scratch/pmonnaha/Maize/B73_reads/B73.chr1.8.fa_allele_1_S_0.0020_I_0.0001_C_2.00_1000_150000.fa_2_300_50_100.0_100.fq.convertReadNames.fq.gz 15138 | /panfs/roc/groups/14/hirschc1/pmonnaha/software/speedseq//bin/samblaster --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 --splitterFile /panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/b73aln/IND81/IND81/spl_pipe --discordantFile /panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/b73aln/IND81/IND81/disc_pipe 15139 | /panfs/roc/groups/14/hirschc1/pmonnaha/software/speedseq//bin/sambamba view -S -f bam -l 0 /dev/stdin 15140 Killed | /panfs/roc/groups/14/hirschc1/pmonnaha/software/speedseq//bin/sambamba sort -t 24 -m 58G --tmpdir=/panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/b73aln/IND81/IND81/full -o /panfs/roc/scratch/pmonnaha/Maize/b73_sim_bams/IND81.b73aln.bam /dev/stdin

I'd very much appreciate any help you could provide on this.

Best, Patrick

ZhangRuiQian778 commented 6 years ago

Hi, I just encountered the same problem as you. I guess there is a problem with the reference genome. Will you have a solution now?

pmonnahan commented 6 years ago

Sorry, but I never found a solution. I gave up on using speedseq, and used bwa mem along with samtools instead.

ZhangRuiQian778 commented 6 years ago

@pmonnahan I used goat reference genome,it including too much nonchromosomal sequence (such as mitochondrial genome, sequence contigs not yet mapped on chromosomes). I think this is the cause of the error. So, I only keep chromosome sequence for building the bwa index, it worked (my study not foucus on the nonchromosomal sequence). I hope this helps you.

pmonnahan commented 6 years ago

@ZhangRuiQian778 Wow, that seems to have fixed it for me. Thanks so much for the suggestion!!