segway recovery (training mode) fails to copy over likelihood files

EricR86 commented 8 years ago

Original report (BitBucket issue) by Rachel Chan (Bitbucket: rcwchan).

While running segway on recovery (training mode), I received this error:

#!python

Traceback (most recent call last):
   File "/mnt/work1/users/home2/rachelc/.local/bin/segway", line 9, in <module>
     load_entry_point('segway==1.4.1.dev0', 'console_scripts', 'segway')()
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2919, in main
     return runner()
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2712, in __call__
     self.run(*args, **kwargs)
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2679, in run
     self.run_train()
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2225, in run_train
     self.finish_train(instance_params, dst_filenames)
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2207, in finish_train
     self.proc_train_results(instance_params, dst_filenames)
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 2332, in proc_train_results
     self.copy_results(name, src_filename, dst_filename)
   File "/mnt/work1/users/home2/rachelc/segway/segway/run.py", line 1743, in copy_results
     copy2(src_filename, dst_filename)
   File "/mnt/work1/software/python/2.7/lib/python2.7/shutil.py", line 128, in copy2
     copyfile(src, dst)
   File "/mnt/work1/software/python/2.7/lib/python2.7/shutil.py", line 82, in copyfile
     with open(src, 'rb') as fsrc:
 IOError: [Errno 2] No such file or directory: path('/mnt/work1/users/hoffmangroup/rachelc/2016/semisupervised_tests/20160513_1347/results/20160513-1717/K562_5_Track.traindir/likelihood/likelihood.1.ll')

Upon further investigation, I discovered that in every previous recovery directory, the number of likelihood files declined with recency. For instance, in the earliest recovery directory for this run, there were 10 files (this run was recovered several times). In the second recovery directory, there were again 10, in the next, 8, then 5, then 3, then 0, and 0 again (arriving at the most recent recovery directory).

My theory is that the likelihood files are written-to and updated so long as the particular instance (10 instances in this case) is still active. So for the first two runs, instance 0 was still running and so its likelihood file was written to and appeared in the first two runs. Then after that, it did not appear again. Then in the final recovery run, when segway attempts to pick a winner (or do a cumulative analysis or something at the end), it can't find the likelihood files because they were not run during that recovery instance, and errors out. Looking at the files present and missing in each recovery attempt's directory, this seems to be the case.

EricR86 commented 8 years ago

Original comment by Rachel Chan (Bitbucket: rcwchan).

Edited issue description

EricR86 commented 8 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).

changed priority from "major" to "minor"

EricR86 commented 8 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).

That hypothesis makes sense to me.

hoffmangroup / segway

segway recovery (training mode) fails to copy over likelihood files #67