hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Recover option for segway annotate #159

Closed gaborpapp89 closed 2 years ago

gaborpapp89 commented 2 years ago

Hi!

I have an interrupted annotate task, and i would like to use the --recover option, but Segway (3.0.3) could recognize it:

error: 'unrecognized arguments: --recover=identifydir_previous'

I use it like:

segway annotate --exclude-coords=blacklist.bed --recover=identifydir_previous samples.genomedata traindir identifydir_new

How to use it correctly? I could not find out on my own... Thank You in advance

EricR86 commented 2 years ago

Hi,

The original command you put in mentioned --recovery and not --recover so let me know if that fixed it for you. Otherwise that looks like it should be correct. If it's still not working can you show the command + output entirely here so I can investigate?

Thanks!

gaborpapp89 commented 2 years ago

Thank You for your quick answer!

The original command i used is (recover):

'SEGWAY_CLUSTER=local segway annotate --exclude-coords=blacklist_HCT116_merged_d500.bed --recover=identifydir12_new lab_merged_samples.genomedata train12 identifydir12new'

and the error message:

'usage: segway [global_args] COMMAND [args]... segway: error: unrecognized arguments: --recover=identifydir12_new'

Edit.: I also have the same error when i tried out the resolution=100 option in an annotate task:

segway: error: unrecognized arguments: --resolution

In the documentation i read : 'Warning: You must use the same resolution for both training and identification'. I infered from that that the annotate task should have a resolution option as well... For the training task i used resolution=100 and worked fine.

Thank You!

EricR86 commented 2 years ago

Hi it looks like this is indeed a bug. It looks to be a regression in the new Segway 3.0 series from an interface change. I don't have an immediate fix so your best bet at this time is likely just to re-run the annotation if its not too computationally expensive.

Thanks for the report!

gaborpapp89 commented 2 years ago

Thank You for your answer!

Actually the computation time with this options is quite long for the annotation task.

Therefore i also edited my previous comment about the resolution option, could You please check it (maybe this option (if it exists for an annotation task) could reduce computation time(?))

Thanks

EricR86 commented 2 years ago

The resolution option does work. However it assumes that the training and annotating are done at the same resolution. For example, if you have trained your model at 10bp, your annotation will also be at 10bp resolution. So the option doesn't exist for the annotation step, it just takes whatever resolution you trained at initially.

The biggest time saves in computational time are:

  1. Resolution
  2. Genomic region selection: only annotate and train on regions that are mappable (e.g. ignore telomeric regions)
gaborpapp89 commented 2 years ago

Thanks!

EricR86 commented 2 years ago

@gaborpapp89 this has been fixed in the newest 3.0.4 release. Let us know if you find any further issues.