hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

segway overwrites custom models! #25

Closed EricR86 closed 9 years ago

EricR86 commented 9 years ago

Original report (BitBucket issue) by Anonymous.

The original report had attachments: segway_overwritten.zip, diff.summary


This is how to reproduce the error. I simply used the test genomedata from your website and made some rather vanilla changes to the structure and input master files. See attached zip file for the custom files used in this test.

diff custom.str traindir/segway.str 3c3 < GRAPHICAL_MODEL model_custom

GRAPHICAL_MODEL model_seg 8d7 < switchingparents: nil; 14d12 < switchingparents: nil; 21d18 < switchingparents: nil;

diff custom.inc traindir/auxiliary/segway.inc

diff custom.master traindir/params/input.master 1c1 < #include "custom.inc"

include "traindir/auxiliary/segway.inc"

After this, the "custom.str" and "custom.master" files are copied to "custom_backup" folder. Then we run the following command:

segway --clobber --seg-table=custom.tab --resolution=200 --ruler-scale=200 --input-master=custom.master --structure=custom.str --cluster-opt="-q scavenger -P scavenger -W 01:00" --num-labels=4 train test.genomedata customdir

After segway finished running, the two files were overwritten by segway without any warnings.

EricR86 commented 9 years ago

Original comment by Li Shen (Bitbucket: lishen01).


The text I pasted seems to be taken as wiki format. So I attach the text again.

EricR86 commented 9 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


EricR86 commented 9 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Note that the test genomedata archive mentioned here is referring to: http://pmgenomics.ca/hoffmanlab/proj/segway/2011/test.genomedata

Also the attached segway_overwritten.zip already has changes between it's backup folder and what's listed in it's base folder

$ diff -q custom_backup/custom.inc custom.inc 
$ diff -q custom_backup/custom.master custom.master 
Files custom_backup/custom.master and custom.master differ
$ diff -q custom_backup/custom.str custom.str
Files custom_backup/custom.str and custom.str differ

However upon copying the files from the backup folder first, then running the commands as suggested above, the custom.str is different from the backup as reported.

EricR86 commented 9 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


The issue here is with the --clobber option used in the example provided and it's unclear functionality in this context.

When specifying custom files for your model on the command line, segway will try to output to that filename (say for saving the structure to a specific file or directory) or use that file as input (for example training in this case) if the file already exists.

However if the --clobber option is put out, it assumes that the files specified on the command line are supposed to be outputs that need to be overwritten regardless of what's already there. Hence using --clobber and specifying a model input (structure file) doesn't cooperate properly. It simply writes over (or creates a new file) with that filename regardless of what was there before.

There is no code fix planned for this. This issue can be worked around by not using --clobber when custom inputs are requested. The only downside being is on re-runs having to remove or change your output work directory.

The documentation will be updated and the commit will be referenced here.

EricR86 commented 9 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


As of 3f2f7da9778850f854714d521e4ffc8cfeb64b2f the documentation is updated. The documentation change can be found in this commit: a58073fd5bfefecbb72fb5031aea801c25f47cd3