immcantation / presto

pRESTO is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq). pRESTO is a bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
https://presto.readthedocs.io
GNU Affero General Public License v3.0
0 stars 0 forks source link

error during AlignSets #15

Closed ssnn-airr closed 6 years ago

ssnn-airr commented 9 years ago

Original report by Anonymous.


This happened in my run during an AlignSets step, after getting through ~75% of the method. Two other nodes running the script got through it fine.

I've seen this 'Error in sibling process detected.' previously, also during the AlignSets call.

I am running this in linux on AWS EC2. I attached the core file that it put out after the error. Happy to provide more information. brianbelmont@abvitro.com

#!shell

stdin: is not a tty
Error processing sequence set with ID: TCCTTGCAATTAATTC_ACTGCT.
PID 27950:  Error in sibling process detected. Cleaning up.
Process Process-9:
PID 27941:  Error in sibling process detected. Cleaning up.
PID 27946:  Error in sibling process detected. Cleaning up.
PID 27943:  Error in sibling process detected. Cleaning up.
PID 27944:  Error in sibling process detected. Cleaning up.
PID 27947:  Error in sibling process detected. Cleaning up.
PID 27948:  Error in sibling process detected. Cleaning up.
PID 27945:  Error in sibling process detected. Cleaning up.
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/ebsdata/scripts/presto//AlignSets.py", line 265, in processASQueue
    align_list = align_func(seq_list, **align_args)
  File "/ebsdata/scripts/presto//AlignSets.py", line 75, in alignSeqSet
    align = AlignIO.read(stdout_handle, 'fasta')
  File "/usr/local/lib/python2.7/dist-packages/biopython-1.64-py2.7-linux-x86_64.egg/Bio/AlignIO/__init__.py", line 427, in read
    raise ValueError("No records found in handle")
ValueError: No records found in handle
PID 27942:  Error in sibling process detected. Cleaning up.
ssnn-airr commented 6 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


As this seems to be a child process memory issue, I'm closing this in favor of #6.

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Thanks. If it's not a reliable error it may take be a bit to fix, as I need to reproduce it. I'll start some tests and get back to you when I track it down.

ssnn-airr commented 9 years ago

Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).


Whoops, I must have mistyped that somewhere. Thank you for the clarification, it runs now. I think we've still been seeing the other errors in sibling processes though, and will continue to report them when they come up. Cheers

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


My suspicious is that it's some sort of pipe timing issue with EC2, as what appears to be happening is that the output is from MUSCLE is empty. I'll test though.

ssnn-airr commented 9 years ago

Original comment by Anonymous.


I've seen it a number of times too over the last few months, but never reproducibly.

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Thanks. I've been using Biopython's muscle interface. I'll probably just have to write my own.

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


I’ll debug and try to fix the issue Monday. Can you share the input file with me (via s3)? And the command line arguments to AlignSets?

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


This should be usearch instead of muscle. Can you retry with "--exec /usr/bin/usearch"? (Or wherever you have usearch installed.)

ssnn-airr commented 9 years ago

Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).


I'm also getting this error. Mine is during AssemblePairs, but the issue still seems to be when using muscle.

+ /usr/bin/time -o Runtime.log -a -f ''\''%C\t%E\t%P\t%Mkb'\''' nice AssemblePairs.py reference --exec /usr/bin/muscle --maxhits 100 --minident 0.5 --evalue 1e-5 -1 141124AbV_D14-8159_R2_sequence_subsampled_10000_fusionprimers-pass_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq -2 141124AbV_D14-8159_R1_sequence_subsampled_10000_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq --1f CONSCOUNT --2f CONSCOUNT PRCONS -r /home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta --log AssemblePairs-reference.log --nproc 2 --failed
Error processing sequence with ID: ATTTTCAGATGTCT_GTGTTG.
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
PID 20432:  Error in sibling process detected. Cleaning up.
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
PID 20428:  Error in sibling process detected. Cleaning up.
    result = process_func(data, **process_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
    stitch = assemble_func(head_seq, tail_seq, **assemble_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
    usearch_exec=usearch_exec)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
    stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
  File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmp9p5Sj8', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpjMGMZP', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
Error processing sequence with ID: GGACTATAGGTAACTAA_TGATAT.
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
    result = process_func(data, **process_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
    stitch = assemble_func(head_seq, tail_seq, **assemble_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
    usearch_exec=usearch_exec)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
    stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
  File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmpWzJqgu', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpKNiUSd', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Haven't had any runs of AlignSets show the error on our cluster yet. I might just write a new muscle wrapper anyway and have y'all test that.

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Assuming no news is good news. Please reopen if the issue crops up again.

ssnn-airr commented 9 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


I haven't been able to reproduce the problem on my end, but I just made some changes to how muscle is called in AlignSets which (in my imagination) might help (removed the shell invocation and changed the buffering).

Let me know if you still encounter this error?

ssnn-airr commented 8 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Popped up again. Reopening.

ssnn-airr commented 8 years ago

Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).


Hi, just wanted to say that we ran into this again recently. I think it's a memory issue in the child processes, since the affected barcodes were by far the most common in the pair-pass file. We're currently applying a band-aid by reserving a whole node for the job, we'll see if that helps. If this is the case, we probably haven't run into it recently because we've been mostly doing AbPair which downsamples prior to this step, whereas we do not do the downsampling prior to AbSeq.

ssnn-airr commented 8 years ago

Original comment by David Koppstein (Bitbucket: dkoppstein, GitHub: dkoppstein).


Just to add to my previous comment, the reason why it may have been sporadic before is because the AlignSets job may or may not have been sharing a node with another high-memory job at the time. Just speculation at this point though.

ssnn-airr commented 8 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Hey @dkoppstein, thanks. Are you using the 32 bit or 64 bit version of muscle? I'll take a look at the memory usage, and fix anything I can within the python parts. If the memory limit is being hit within muscle, then I suspect the only solution will be to add another wrapper for CD-HIT, or swarm, or something. I'd really like to start porting bits and piece to SeqAn soon, so that might actually be the best solution if it contains a suitable algorithm.

Please keep me posted. And I'll try to look at this soon. This week and next will be a little tough though.

ssnn-airr commented 9 years ago

Original comment by Anonymous.


I have since deleted the exact file I was analyzing. I uploaded another similar one that I was processing in parallel, but did not produce the error (doubt it matters since I did just restart the analysis on the original file and it went fine the second time, so may not be file-dependent). File: https://s3.amazonaws.com/abvitro-abpair/abpair_analysis/150330_BB/150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq

call was: /usr/bin/time -o $RUNTIME -a -f '%C\t%E\t%P\t%Mkb' nice AlignSets.py muscle -s 150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq --exec $MUSCLE_PATH --bf DB_MB --nproc 8

ssnn-airr commented 9 years ago

Original comment by Anonymous.


Luckily, since it is not a reliable error, there is always an option of just rerunning the exact same script and it'll likely get through. So no immediate fix is needed, but mainly wanted to bring it to your attention. I can still keep you posted if/when this happens again.

ssnn-airr commented 7 years ago

Original comment by Julian Zhou (Bitbucket: jqz, GitHub: julianqz).


It'd be really nice if there could be some sort of built-in checkpoint mechanisms so that if anything happens one doesn't have to start all over again. I'm at 90% done with AlignSets after almost a day, but it looks like that I'm gonna hit the wall time limit (should have set that to be longer too..) and would have to start all over again :/

ssnn-airr commented 2 years ago

Original comment by Armando Olivieri (Bitbucket: [Armando Olivieri](https://bitbucket.org/Armando Olivieri), ).


Hi, I went through this error too and I resolved with MUSCLE 3.8.31 version (https://wiki.anunna.wur.nl/index.php/Muscle_3.8.31).