ibest / ARC

Assembly by Reduced Complexity (ARC)
Apache License 2.0
41 stars 5 forks source link

"KeyError" on failed target assembly after previous iteration killed for that target #45

Open atcg opened 9 years ago

atcg commented 9 years ago

I'm running into an issue where I get a python KeyError for targets that fulfill the following conditions:

1). Target assembly fails in iteration 1. Then "Writing reads as contigs." 2). Target assembly killed (spades times out) on iteration 2. Then "Writing contigs from previous iteration." 3). Target recruits fewer reads in iteration 3 than in 2. Target assembly fails in iteration 3.

evan@maven:/mnt/Data1/Gary/ARC$ grep Contig40947 ARC--try2.log 
[2015-02-05 08:54:19,745 INFO 21734] Sample: ALL5 target: 36948|Contig40947 iteration: 1 Split 180 reads in 0.0773031711578 seconds
[2015-02-05 08:54:56,345 INFO 21744] Sample: ALL5 target: 36948|Contig40947 iteration: 1 Assembly failed after 7.53394508362 seconds
[2015-02-05 08:58:46,348 INFO 21761] Sample: ALL5 target: 36948|Contig40947 finishing target..
[2015-02-05 08:58:46,348 INFO 21761] Sample: ALL5 target: 36948|Contig40947 iteration: 1 Assembly reports status: assembly_failed.
[2015-02-05 08:58:46,348 INFO 21761] Sample ALL5 target 36948|Contig40947: Writing reads as contigs.
[2015-02-05 10:47:54,027 INFO 21740] Sample: ALL5 target: 36948|Contig40947 iteration: 2 Split 411178 reads in 240.437613964 seconds
[2015-02-05 12:56:23,086 WARNING 21751] Sample: ALL5 target: 36948|Contig40947 Assembly killed after 7200.12375808 seconds.
[2015-02-05 12:56:23,109 INFO 21751] Sample: ALL5 target: 36948|Contig40947 iteration: 2 Assembly killed after 7200.14657593 seconds
[2015-02-06 11:39:30,387 INFO 21740] Sample: ALL5 target: 36948|Contig40947 finishing target..
[2015-02-06 11:39:30,387 INFO 21740] Sample: ALL5 target: 36948|Contig40947 iteration: 2 Assembly reports status: assembly_killed.
[2015-02-06 11:40:16,442 INFO 21740] Sample: ALL5 target: 36948|Contig40947 iteration: 2 Writing contigs from previous iteration.
[2015-02-08 10:39:37,060 INFO 21736] Sample: ALL5 target: 36948|Contig40947 iteration: 3 Setting last_assembly to True
[2015-02-08 10:39:37,060 INFO 21736] Sample: ALL5 target: 36948|Contig40947 iteration: 3 Split 2511 reads in 2.27638602257 seconds
[2015-02-08 10:40:18,630 INFO 21753] Sample: ALL5 target: 36948|Contig40947 iteration: 3 Assembly failed after 41.5684621334 seconds
[2015-02-09 08:02:30,358 INFO 21752] Sample: ALL5 target: 36948|Contig40947 finishing target..
[2015-02-09 08:02:30,358 INFO 21752] Sample: ALL5 target: 36948|Contig40947 iteration: 3 Assembly reports status: assembly_failed.
[2015-02-09 08:02:30,358 INFO 21752] Sample ALL5 target 36948|Contig40947 did not incorporate any more reads, no more mapping will be done

KeyError: '36948|Contig40947'

Bottom of the ARC output:

[2015-02-09 08:02:30,358 INFO 21752] Sample: ALL5 target: 36948|Contig40947 finishing target..
[2015-02-09 08:02:30,358 INFO 21752] Sample: ALL5 target: 36948|Contig40947 iteration: 3 Assembly reports status: assembly_failed.
[2015-02-09 08:02:30,358 INFO 21752] Sample ALL5 target 36948|Contig40947 did not incorporate any more reads, no more mapping will be done
[2015-02-09 08:02:30,584 ERROR 21752] Traceback (most recent call last):

  File "/home/evan/anaconda/lib/python2.7/site-packages/ARC/process_runner.py", line 62, in run
    self.launch()

  File "/home/evan/anaconda/lib/python2.7/site-packages/ARC/process_runner.py", line 43, in launch
    job.runner()

  File "/home/evan/anaconda/lib/python2.7/site-packages/ARC/runners/base.py", line 58, in runner
    self.start()

  File "/home/evan/anaconda/lib/python2.7/site-packages/ARC/runners/finisher.py", line 135, in start
    self.write_target(target, target_folder, outf=fin_outf, finished=True)

  File "/home/evan/anaconda/lib/python2.7/site-packages/ARC/runners/finisher.py", line 291, in write_target
    targetLength=self.params['summary_stats'][target]['targetLength'],

KeyError: '36948|Contig40947'

[2015-02-09 08:02:30,584 ERROR 21752] An unhandled exception occurred
[2015-02-09 08:02:30,585 ERROR 21684] Terminating processes
[2015-02-09 08:02:30,728 ERROR 21684] ARC.app unexpectedly terminated
[2015-02-09 08:02:30,732 INFO 21684] process shutting down

Any thoughts? Thanks very much! Evan

samhunter commented 9 years ago

This looks like a bug in the way that I handle finishing a target in this weird special case. That target should not have continued to iteration 3. I will try to reproduce it with some test data and see if I can figure out why ARC is doing this. Thanks for the bug report.

atcg commented 9 years ago

Thanks Sam. It's possible it has something to do with the assembly being killed at the end of the iteration, while ARC is just waiting for the last assemblies to finish. I've noticed when that happens, spades keeps running in the background to finish the assembly (even after ARC has given up on it and proceeds to start dumping the contigs for the next iteration).

I'm not sure that is the case for that particular target that got the keyerror, though.

beanurn commented 6 years ago

I have just come across this problem as well. The following may or may not be relevant: a previous run with the same read set and the same reference finished successfully and had sloppymapping=False. The run that crashed had this parameter set to True.

Beate