Open koisland opened 6 months ago
It will also falsely call in cases where the edges of a centromeric contig do not contain a full 500kbp of the centromeric transition region.
One idea is to change the heuristic for determining partial contigs. Instead of using a flat percentage of alpha-satellite on the edges, we should calculate entropy across the contig. With the structure of the centromere, we should see a parabola with high entropy at the edges with monomeric alpha-satellite/other repeats and uniformity/low entropy in the center.
censtats status will miss partial calls if the edge is completely HSAT in partial centromeres like chr13 and chr21