hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Using the '--prior-strength' option causes training to crash #136

Closed EricR86 closed 1 year ago

EricR86 commented 4 years ago

Original report (BitBucket issue) by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


The following error is reported by GMTK:


ERROR: reading file 'traindir/params/input.0.master' line 178, DenseCPT 'segCountDown_seg_segTransition' specified Dirichlet Table (dirichlet_segCountDown_seg_segTransition) that does not exist

EricR86 commented 4 years ago

Original comment by mlibbrecht (Bitbucket: mlibbrecht, GitHub: mlibbrecht).


Hi Eric – We’re working on a new round of ENCODE Segway annotations and we’d like to use --prior-strength. Is there any plan to fix this issue? Thanks!

EricR86 commented 4 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).


--prior-strength never really had much of the desired effect. The segTransition-weight-scale option has a bigger effect.

EricR86 commented 4 years ago

Original comment by mlibbrecht (Bitbucket: mlibbrecht, GitHub: mlibbrecht).


I did some experiments about this a while ago. I believe the result was that using --segtransition-weight-scale was only effective with --prior-strength as well. I think the value of --prior-strength itself wasn’t too important, but it was important that it was nonzero. The experiments have been archived now, so they’re hard to dig up, unfortunately.

EricR86 commented 4 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


If I recall correctly this error was from a regression involving the mixture of gaussians change (where --prior-strength is not actively tested). We switched over from a Dirichlet Table to constants instead. Last I took a cursory look at this, there is a non-trivial amount of work to resurrect the table and re-introduce it for the --prior-strenth option. Obviously it would be ideal if you could get equivalent or better results without using it. As far as plans go this issue is unfortunately not currently high on my todo list but may move up depending on how @michaelmhoffman wishes to triage this issue in particular.