Closed hyanwong closed 10 months ago
When calling tables.graph()
, we can also populate the (new) edge column of the iedges table.
Another really useful function (which I will need) is a conversion function to take a location between child_left and child_riught, and convert it to a parent position. I'm not sure of the best name, perhaps edge.parent_position(child_position)
(and `edge.child_position(parent_position)
)? Suggestions @duncanMR ?
We are starting to get to the stage where it might be useful to have a "immutable" GIG. I.e. like in tskit, we turn the tables representation of a GIG into an immutable object, with defined properties (e.g. an index, so we can iterate over edges by child ID, an "edge" class so that we can do
edge.is_inversion()
, and so on.I have some code to start doing this. I suggest something like:
import GeneticInheritanceGraph as gig import msprime ts = msprime.sim_ancestry(5) tables = gig.Tables.from_tree_sequence(ts) gig = tables.graph() # This should check that we have a valid GIG defined, e.g. abs(child_right - child_left) == abs(parent_right - parent_left) print(gig.edges[0]) print(gig.edges.child_left[0]) ...
I've been working on something like this for visualisation experiments, based on Jerome's code (see a snippet below). At the moment I have it in a file called graph.py
. It's not ready for merging due to its reliance on Pandas and inconsistent naming, but do you think the overall organisation makes sense? I will rename every instance of edge to iedge given our discussion earlier.
I notice an error in your code that I also kept making: naming an instance of a GIG object gig
, when you have already imported the package using that name. I've switched to using import GeneticInheritanceGraph as gig_lib
to avoid this. I'm not convinced gig_lib
is the best name to use, but I do think we should reserve the name gig
for class instances just like ts
in tskit.
Another really useful function (which I will need) is a conversion function to take a location between child_left and child_right, and convert it to a parent position. I'm not sure of the best name, perhaps
edge.parent_position(child_position)
(and`edge.child_position(parent_position)
)? Suggestions @duncanMR ?
I've implemented this below in the translate_coordinate_along_edge
helper function, which is for a particular iedge; it is used in the GIG class method translate_coordinate
to find the set of all x values in a parent from an input x in the child's genome, and similarly for an input x in the parent's genome. I made direction an argument to avoid repetition, but it's probably wise to at least have aliases for each direction. I'm happy with your names for such aliases; do you mean that you want to make them methods for a particular iedge, or for the IEdges class?
def translate_coordinate_along_edge(edge, x, direction):
if direction == "up":
if edge.child_left <= x < edge.child_right:
parent_length = edge.parent_right - edge.parent_left
child_length = edge.child_right - edge.child_left
if parent_length == -child_length:
# inversion
return edge.parent_left - (x - edge.child_left)
elif edge.child_left != edge.parent_left:
# shift
return x - (edge.child_right - edge.parent_right)
else:
# no change
return x
elif direction == "down":
if (
edge.parent_left <= x < edge.parent_right
or edge.parent_left > x >= edge.parent_right
):
parent_length = edge.parent_right - edge.parent_left
child_length = edge.child_right - edge.child_left
if parent_length == -child_length:
# inversion
return edge.child_left + (edge.parent_left - x)
elif edge.child_left != edge.parent_left:
# shift
return x + (edge.child_right - edge.parent_right)
else:
# no change
return x
else:
raise ValueError(f"Invalid direction {direction}")
class GIG:
"""
A GIG is an ARG in which edges are annotated with coordinate
transform functions. Nodes in a GIG are integers, and represent
specific ancestral genomes (identical to an ARG). Edges
represent a genetic inheritance relationship between two
genomes. Finally, coordinate transforms define what parts
of a genome were inherited and the effects of any structural
variants that may have occured.
"""
def __init__(self, table_group):
self.intervals = table_group.intervals._df()
self.nodes = table_group.nodes._df()
self.node_explorer = NodeExplorer(self, 0)
pass
@property
def num_nodes(self):
"""
Return the number of nodes in the GIG
"""
return len(self.nodes)
def parents(self, u):
"""
Return the parents of node u in the graph.
"""
all_parents = self.intervals[self.intervals.child == u]["parent"]
return all_parents.drop_duplicates()
def children(self, u):
"""
Return the children of node u in the graph.
"""
all_children = self.intervals[self.intervals.parent == u]["child"]
return all_children.drop_duplicates()
def transate_coordinate(self, p, c, x_in, direction, capture=False):
"""
Transforms a coordinate in one node to a (possibly empty) set of
coordinates in an adjacent node. If direction is "up", then the
x value is assumed to be in the coordinate space of the child node;
if "down", it is assumed to be in the coordinate space of the parent.
The NodeExplorer class records the inputs and outputs of the
transformation as a tree is constructed from the graph;
if capture is True, then the edge between p and c is recorded
in the edges table of the tree sequence being constructed.
"""
edge_family = self.intervals.query("parent == @p and child == @c")
X = []
for _, edge in edge_family.iterrows():
x_out = translate_coordinate_along_edge(edge, x_in, direction)
if x_out is not None:
self.node_explorer.update(
edge=edge,
direction=direction,
x_in=x_in,
x_out=x_out,
capture=capture,
)
X.append(x_out)
return set(X)
...
This looks good, thanks @duncanMR . I think we should be using "span" rather than "length". In fact, in my version (also called graph.py
!), I have something like this:
@property
def child_span(self):
return self.child_right - self.child_left
@property
def parent_span(self):
return self.parent_right - self.parent_left
@property
def span(self):
return self.child_span if child_span >= 0 else self.parent_span
@property
def is_inversion(self):
return self.child_span < 0 or self.parent_span < 0
I could push my graph.py
file in a PR so you can see the sort of thing I'm thinking about. We could have "to_df" methods on the various tables if you need conversion to a pandas data frame.
This looks good, thanks @duncanMR . I think we should be using "span" rather than "length". In fact, in my version (also called
graph.py
!), I have something like this:@property def child_span(self): return self.child_right - self.child_left @property def parent_span(self): return self.parent_right - self.parent_left @property def span(self): return self.child_span if child_span >= 0 else self.parent_span @property def is_inversion(self): return self.child_span < 0 or self.parent_span < 0
That's a good point; I'll use span in future.
I could push my
graph.py
file in a PR so you can see the sort of thing I'm thinking about. We could have "to_df" methods on the various tables if you need conversion to a pandas data frame.
That would be great! I wasn't planning on pushing any of my code soon, since it is focused on making visualisations of simple examples.
I noticed an error in your code that I also kept making: naming an instance of a GIG object gig, when you have already imported the package using that name.
Yes, good spot, and a telling mistake. We could, I suppose, use g
as the name of the graph variable, but gig
is more expressive. As you say, gig_lib
or gigl
(😆 ) might work. Or genigraph
? I'm open to other suggestions. We don't use tskit
that often in raw code: mainly for reading in files and for constants.
I think import GeneticInheritanceGraphLibrary as gigl
is a sensible and fun choice! The length isn't ideal, but I haven't thought of a shorter name that works.
Great, thanks. I agree. I will return to this after PopGroup. gigl
also rolls off the tongue nicely.
Closing this as it's now done.
We are starting to get to the stage where it might be useful to have a "immutable" GIG. I.e. like in tskit, we turn the tables representation of a GIG into an immutable object, with defined properties (e.g. an index, so we can iterate over edges by child ID, an "edge" class so that we can do
edge.is_inversion()
, and so on.I have some code to start doing this. I suggest something like: