Open wsnoble opened 6 months ago
Also, FYI, this sentence is hard to understand because split-sequences is not mentioned elsewhere in the quick-start: " Therefore, it is best to combine –minibatch-fraction with –split-sequences. "
After further searching the docs, I can't find any explanation of what split-sequences does or what value it takes. Can someone please explain this?
--split-sequences
puts an upper limit on the size of the windows used for training and inference in base-pairs. The default is 2000000 bp.
For more details see: https://segway.readthedocs.io/en/latest/technical.html#memory-usage
The defaults for all options are both in the --help
command line output as well in the command-line usage summary portion of the docs.
Does the original resolution of genomedata have any impact on the --split-sequences function?
No there is no inherent "resolution" for Genomedata. --split-sequences
effectively will split at base-pair boundaries regardless of whatever underlying resolution the dataset originates from.
Thank you!
I'm reading the quickstart, and one minor thing that I think would be very helpful is if this documentation mentioned, for each option, what the default value is if you don't specify it.