hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

High resolution causes error - Questions about interpretation of results #182

Open ChelseyLin3 opened 2 months ago

ChelseyLin3 commented 2 months ago

Hi, I created a genomedata archive using a bed file with a resolution of 100kb as the segway input. I am wondering if there's a way to specify the input resolution using bigger than 10kb bin size. Using the --resolution flag of 100kb seems to produce an intermediate file error.

Screen Shot 2024-07-11 at 11 57 39 AM

And the intermediate file looked like this. Thank you in advance for any suggestions! @wsnoble

Screen Shot 2024-07-11 at 11 57 28 AM
EricR86 commented 1 month ago

The issue is from a high --resolution option. The presence variable tracks missing data across downsampled datasets and in this case it attempt to track 100000 underlying datapoints that may be missing per downsample.

Obviously not ideal, but at least a lower resolution (e.g. 1000) should get the same results without error and with still a significant speedup on lower resolution data.