hyanwong / giglib

MIT License
4 stars 2 forks source link

Order of iedge items #56

Closed hyanwong closed 9 months ago

hyanwong commented 9 months ago

A trivial question, but when we access or add an iedge, what should the order of items be?

Currently we use (parent, child, parent_left, child_left, parent_right, child_right, others_including_chr_and_edge_id)

Tskit uses (left, right, parent, child)

I wonder if we should switch to (parent_left, parent_right, child_left, child_right, parent, child, ...) or (child_left, child_right, parent_left, parent_right, child, parent, ...)?

hyanwong commented 9 months ago

Since we have established the convention that child_left < child_right, I'm slightly inclined to put the child first, so that the span is always easy to see from the first 2 columns. I.e.

(child_left, child_right, parent_left, parent_right, child, parent, ...)

or

(child, parent, child_left, child_right, parent_left, parent_right, ...)

I'm inclined to go for the first one.

duncanMR commented 9 months ago

The current implementation has been bothering me, because it splits up the intervals. My preference is also to put the child first to align with our convention, i.e.

(child_left, child_right, parent_left, parent_right, child, parent, ...)

My only concern with this is that it breaks with tskit's convention of putting the parent node before the child, but I don't think that is a big issue. We can just add that to the list of differences between gigl and tskit.

hyanwong commented 9 months ago

Great, glad we agree. I'll change it.

It is helpful to have some unnamed parameters (so we can fit an add_row call on one line) but confusing with so many numbers. I suggest we adopt the following convention:

iedges.add_row(0, 1, 20, 21, child=100, parent=200)

I.e. we always specify the child & parent keywords. We can enforce this using _: KW_ONLY in the data class, if we want. What do you think @duncanMR ?

duncanMR commented 9 months ago

I agree; that will make it less confusing to read.