hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Doc suggestion #180

Open wsnoble opened 3 months ago

wsnoble commented 3 months ago

I'm reading the quickstart, and one minor thing that I think would be very helpful is if this documentation mentioned, for each option, what the default value is if you don't specify it.

wsnoble commented 3 months ago

Also, FYI, this sentence is hard to understand because split-sequences is not mentioned elsewhere in the quick-start: " Therefore, it is best to combine –minibatch-fraction with –split-sequences. "

wsnoble commented 3 months ago

After further searching the docs, I can't find any explanation of what split-sequences does or what value it takes. Can someone please explain this?

EricR86 commented 3 months ago

--split-sequences puts an upper limit on the size of the windows used for training and inference in base-pairs. The default is 2000000 bp.

For more details see: https://segway.readthedocs.io/en/latest/technical.html#memory-usage

The defaults for all options are both in the --help command line output as well in the command-line usage summary portion of the docs.

ChelseyLin3 commented 3 months ago

Does the original resolution of genomedata have any impact on the --split-sequences function?

EricR86 commented 3 months ago

No there is no inherent "resolution" for Genomedata. --split-sequences effectively will split at base-pair boundaries regardless of whatever underlying resolution the dataset originates from.

ChelseyLin3 commented 3 months ago

Thank you!