hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Warn users when the number of windows exceed a threshold #124

Open EricR86 opened 6 years ago

EricR86 commented 6 years ago

Original report (BitBucket issue) by Mickaël Mendez (Bitbucket: Mickael Mendez).


A high number of windows can indicate an error in the way the inputs were processed and results in a very long training or annotation time.

Segway could print a warning message after calculating the windows if it's number exceed an unreasonnable threshold (maybe 10 000).

This enhancement was suggested by Michael Hoffman.

EricR86 commented 6 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Is there any consideration of what would be considered unreasonable? What if a user wants to train on a lot of specific regions (like GENCODE regions)?

What if a user wants to disable this warning?

Would printing out the number of windows before job submission be sufficient/better?

EricR86 commented 6 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).


I was thinking 10000 windows. Printing out the number of windows before submitting jobs might help with this without necessitating an option to disable the warning or an un-disableable warning.

EricR86 commented 6 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


After some discussion there was thoughts about changing the job submission log output to be in a rough format of:

"queued: world#, instance #, window # / total windows (full job name)"

EricR86 commented 6 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).


How about "queued: world #/T, instance #/T, window #/T (full job name)"?

I think this will make diagnostics easier.

EricR86 commented 6 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


I think that would probably be best.