Closed ssnn-airr closed 6 years ago
Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).
Whoops, I must have mistyped that somewhere. Thank you for the clarification, it runs now. I think we've still been seeing the other errors in sibling processes though, and will continue to report them when they come up. Cheers
Original comment by Anonymous.
I've seen it a number of times too over the last few months, but never reproducibly.
Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).
I'm also getting this error. Mine is during AssemblePairs, but the issue still seems to be when using muscle.
+ /usr/bin/time -o Runtime.log -a -f ''\''%C\t%E\t%P\t%Mkb'\''' nice AssemblePairs.py reference --exec /usr/bin/muscle --maxhits 100 --minident 0.5 --evalue 1e-5 -1 141124AbV_D14-8159_R2_sequence_subsampled_10000_fusionprimers-pass_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq -2 141124AbV_D14-8159_R1_sequence_subsampled_10000_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq --1f CONSCOUNT --2f CONSCOUNT PRCONS -r /home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta --log AssemblePairs-reference.log --nproc 2 --failed
Error processing sequence with ID: ATTTTCAGATGTCT_GTGTTG.
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
PID 20432: Error in sibling process detected. Cleaning up.
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
PID 20428: Error in sibling process detected. Cleaning up.
result = process_func(data, **process_args)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
stitch = assemble_func(head_seq, tail_seq, **assemble_args)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
usearch_exec=usearch_exec)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmp9p5Sj8', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpjMGMZP', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
Error processing sequence with ID: GGACTATAGGTAACTAA_TGATAT.
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
result = process_func(data, **process_args)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
stitch = assemble_func(head_seq, tail_seq, **assemble_args)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
usearch_exec=usearch_exec)
File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmpWzJqgu', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpKNiUSd', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
I haven't been able to reproduce the problem on my end, but I just made some changes to how muscle is called in AlignSets which (in my imagination) might help (removed the shell invocation and changed the buffering).
Let me know if you still encounter this error?
Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).
Hi, just wanted to say that we ran into this again recently. I think it's a memory issue in the child processes, since the affected barcodes were by far the most common in the pair-pass file. We're currently applying a band-aid by reserving a whole node for the job, we'll see if that helps. If this is the case, we probably haven't run into it recently because we've been mostly doing AbPair which downsamples prior to this step, whereas we do not do the downsampling prior to AbSeq.
Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).
Just to add to my previous comment, the reason why it may have been sporadic before is because the AlignSets job may or may not have been sharing a node with another high-memory job at the time. Just speculation at this point though.
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Hey @dkoppstein, thanks. Are you using the 32 bit or 64 bit version of muscle? I'll take a look at the memory usage, and fix anything I can within the python parts. If the memory limit is being hit within muscle, then I suspect the only solution will be to add another wrapper for CD-HIT, or swarm, or something. I'd really like to start porting bits and piece to SeqAn soon, so that might actually be the best solution if it contains a suitable algorithm.
Please keep me posted. And I'll try to look at this soon. This week and next will be a little tough though.
Original comment by Anonymous.
I have since deleted the exact file I was analyzing. I uploaded another similar one that I was processing in parallel, but did not produce the error (doubt it matters since I did just restart the analysis on the original file and it went fine the second time, so may not be file-dependent). File: https://s3.amazonaws.com/abvitro-abpair/abpair_analysis/150330_BB/150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq
call was: /usr/bin/time -o $RUNTIME -a -f '%C\t%E\t%P\t%Mkb' nice AlignSets.py muscle -s 150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq --exec $MUSCLE_PATH --bf DB_MB --nproc 8
Original comment by Anonymous.
Luckily, since it is not a reliable error, there is always an option of just rerunning the exact same script and it'll likely get through. So no immediate fix is needed, but mainly wanted to bring it to your attention. I can still keep you posted if/when this happens again.
Original comment by Julian Zhou (Bitbucket: jqz, GitHub: julianqz).
It'd be really nice if there could be some sort of built-in checkpoint mechanisms so that if anything happens one doesn't have to start all over again. I'm at 90% done with AlignSets after almost a day, but it looks like that I'm gonna hit the wall time limit (should have set that to be longer too..) and would have to start all over again :/
Original comment by Armando Olivieri (Bitbucket: [Armando Olivieri](https://bitbucket.org/Armando Olivieri), ).
Hi, I went through this error too and I resolved with MUSCLE 3.8.31 version (https://wiki.anunna.wur.nl/index.php/Muscle_3.8.31).
Original report by Anonymous.
This happened in my run during an AlignSets step, after getting through ~75% of the method. Two other nodes running the script got through it fine.
I've seen this 'Error in sibling process detected.' previously, also during the AlignSets call.
I am running this in linux on AWS EC2. I attached the core file that it put out after the error. Happy to provide more information. brianbelmont@abvitro.com