hyanwong / giglib

MIT License
4 stars 2 forks source link

Change to sorting by child_right (not child_left)? #122

Closed hyanwong closed 6 months ago

hyanwong commented 6 months ago

At the moment we claim that iedges are sorted by their child node time and ID, then child_left coordinate. In fact, since edges for a given child can't overlap, this is identical to sorting by child_right.

However, if we also want to be able to iterate over iedges with the same parent ID, it does make a different whether the ordering uses the left or right coordinate. The advantage to sorting by the right coordinate is that we can easily find the maximum genome length (which would be the last edge for that chromosome).

it might be that use we are focusing on edges with the same parent id, we would want to create an index sorted by parent_right (or left), but this is potentially trickier because of inversions? I guess it would be possible to sort by max(parent_left, parent_right).

hyanwong commented 6 months ago

Thinking about this more, I'm coming down on the side of sorting the .iedge_map_sorted_by_parent array by parent_chromosome then max(parent_left, parent_right), with ties broken by child id.

hyanwong commented 6 months ago

Now sorting by max(left, right) when looking at parents