graph-genome / component_segmentation

Read in ODGI Bin output and identify co-linear components
Apache License 2.0
3 stars 4 forks source link

Improvement: Don't let components Self-Segment #50

Open josiahseaman opened 3 years ago

josiahseaman commented 3 years ago

Bins can contain ranges of positions that are not always in the same order for all individuals, but all nucleotides are contained within the bin. In these cases, the coverage is still less than 1.0 and no true external rearrangements are present. There should be no self loop, no link column or arrows. In fact, this should not be a valid criteria for a divider at all. In the image below, the two components two the right are also only separate by self loops based on internal coordinates. Those three could be merged into a single new component.

image

It is a desirable criteria that the coordinates inside of each individual's genome be contiguous for each bin individually. That particular restriction may require further thought and design.

Important Note: I would estimate this feature accounts for at least half of the current count of Components in Schematics produced by Pantograph. Completing this feature could drastically improve the overall quality of Pantograph results.