sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
233 stars 32 forks source link

Pedigree visualisation #1097

Closed timothymillar closed 1 year ago

timothymillar commented 1 year ago

Add a simple method for pedigree visualization using graphviz.

I've been thinking about documentation/examples for working with pedigree data and one of the current limitations is visualization. I'm not intending on "publication ready" visuals, but rather a straightforward method for sense-checking data (similar to display_genotypes).

The simplest option is to use graphviz, which is already an optional dependency for visualizing dask graphs (it could still be optional). This won't be very performant for larger pedigrees, but it should be good enough for common use-cases. We can allow some customization by passing through arrays to use as node and edge attributes (broadcast to the appropriate size).

Here's some rough examples using the mixed-ploidy pedigree published by Hamilton and Kerr (with simulated genotypes):

Default to using sample_id as labels:

visualize_pedigree(ds)

HK_pedigree_1 gv

Use genotype strings as labels:

labels = label=genotype_as_bytes(ds.call_genotype.values[0], phased=False).astype('U')
visualize_pedigree(
    ds,
    node_attrs=dict(label=labels)
)

HK_pedigree_2 gv

Indicating gametic ploidy (tau) with edge style:

edges = xr.where(
    ds.stat_Hamilton_Kerr_tau == 2,
    "black:black",
    "black",
)
visualize_pedigree(
    ds,
    node_attrs=dict(label=labels),
    edge_attrs=dict(color=edges),  # broadcast to (samples * parents)
)

HK_pedigree_3 gv

timothymillar commented 1 year ago

Related to #1012

benjeffery commented 1 year ago

Nice!

jeromekelleher commented 1 year ago

We used networkx to do this in the msprime docs. Gives reasonably good results, for not much code. See this example: https://tskit.dev/msprime/docs/latest/ancestry.html#pedigrees-and-demography

Using a pre-existing dependency is a good call though, we don't want to pull in more stuff just for this.