labsquare / cutevariant

A standalone and free application to explore genetics variations from VCF file
https://cutevariant.labsquare.org/
GNU General Public License v3.0
102 stars 21 forks source link

Create a Pedigree Chart #193

Open ysard opened 4 years ago

ysard commented 4 years ago

Probably related to #101.

By reading the documentation of SnpEff & SnpSift: https://pcingola.github.io/SnpEff/examples/

By the way, I strongly recommend writing such a tutorial before any new implementation of anything; it helps people know what they can do with software and how they can do it.

It's the difference between displaying a batch of raw data and teach people how to use it. The actual state of the following fields demonstrates well this problem:

case_count_hom
case_count_het
case_count_ref
control_count_hom
control_count_het
control_count_ref
count_hom
count_het
count_ref
count_var

I noticed that SnpSift is able to count the number of homozygous non-reference, heterozygous and total allele counts in cases and controls for each variant.

A priori we have also been doing this recently.

However, they can generate a family tree / pedigree that visually demonstrates how a variant propagates in it.

=> We should be able to generate that, right?

In summary:

I'm surprised that I didn't find a widely accepted implementation for this kind of processing in Python. Would people use obscure programs or an outdated language like R?

Moreover, it is difficult for me to find clear rules and legends for such plots. I guess their reading is so obvious that everyone goes without ...

Can someone write a procedure about how to draw such a Mendeleian status?

PED / tfam plot: https://cran.r-project.org/web/packages/kinship2/vignettes/pedigree.html

dark (no doc, no examples) and old... https://github.com/wintermind/pypedal

Python => pdf: https://github.com/minorninth/pedigree

dridk commented 4 years ago

It would be awesome to show a pedigree tree like this and display genotype status of each sample when clicking on a variant. But drawing a pedigree chart is a real complex problem for complex familly. It required some constraint programming to resolve node's position.

I suggest to create a plugin "pedigree" to display familly tree based on this .

Otherwise, we can draw a standard graph where small node represent an union . But this is not standard and may confused users image

dridk commented 4 years ago

Something like that would be awesome !

image

I guess we can use peddraw code to let graphviz resolve the position of nodes. Then read the dot file with node position and draw manually the chart with Qt things .

ysard commented 4 years ago

Well seen with peddraw!

Don't know how it handles special cases like remarriage etc.

Yes graphviz is made to solve this problem of layout. Moreover it is possible to get coordinates of nodes, and maybe edges without a temp dot file via networkx. The layout of edges is maybe the biggest problem.

https://stackoverflow.com/questions/13938770/how-to-get-the-coordinates-from-layout-from-graphviz pos = nx.drawing.nx_agraph.graphviz_layout(G, prog='dot', args='-Grankdir=LR')

It uses dot in background.

Or via pygraphviz directly:

import pygraphviz as pgv
agraph = pgv.AGraph(splines="ortho")
agraph.add_node(snode.name)
agraph.add_edge(xx, xx)
agraph.layout(prog=xxx)
for node in agraph.nodes():
    pos = node.attr["pos"].split(",")  # check this horrible thing
    xcoord = float(pos[0]) * x_ratio
    ycoord = float(pos[1]) * y_ratio

prog keyword argument in layout() requires the equivalent prog in PATH:

Available layouts:

    - dot - "hierarchical" or layered drawings of directed graphs.
    This is the default tool to use if edges have directionality.

    - neato - "spring model" layouts.
    This is the default tool to use if the graph is not too large
    (about 100 nodes) and you don't know anything else about it.
    Neato attempts to minimize a global energy function, which is equivalent
    to statistical multi-dimensional scaling.

    - fdp - "spring model" layouts similar to those of neato, but does
    this by reducing forces rather than working with energy.

    - sfdp - multiscale version of fdp for the layout of large graphs.

    - twopi - radial layouts, after Graham Wills 97.
    Nodes are placed on concentric circles depending their distance from a
    given root node.

    - circo - circular layout, after Six and Tollis 99, Kauffman and Wiese 02.
    This is suitable for certain diagrams of multiple cyclic structures,
    such as certain telecommunications networks.

.. seealso:: https://www.graphviz.org/

But in think graphviz, graphviz for Python (via pygraphviz and pydot), have system dependencies and system PATH requirements...

To be continued...