tskit-dev / tsviz

Visualisation tools for tree sequences
MIT License
5 stars 3 forks source link

Add "leaf trace" visualisation #1

Open jeromekelleher opened 4 years ago

jeromekelleher commented 4 years ago

A useful visualisation is the leaf trace idea used in ARGWeaver. Examples:

https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1004342.s002&type=supplementary

https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1004342.s014&type=supplementary

Presumably, this would be improved on by having a canonical ordering of the nodes, as @brianzhang01 is thinking about.

Arguably, though, this is bloat, and anything except the basic tree visualisations belong in another repo (tsviz, presumably). This could include various ways of visualising a tree sequence and we could pull in some more powerful libraries rather than having to do everything with SVG.

Any thoughts @tskit-dev/all?

agladstein commented 4 years ago

Having a separate repo could be good if you think you'll add a variety of visualization tools in the future.

molpopgen commented 4 years ago

A separate visualization package is the way to go if there will be a lot of stuff in it.

As for the node ordering, is the vision to change the spec for a tree sequence (yikes!), or to apply some rules for the order in which nodes are added to a graphic?

jeromekelleher commented 4 years ago

As for the node ordering, is the vision to change the spec for a tree sequence (yikes!), or to apply some rules for the order in which nodes are added to a graphic?

Yes, definitely we're just looking for an ordering in which nodes are listed in the graphic!

brianzhang01 commented 4 years ago

Separate repo could potentially be more organic and flexible in terms of API, while tskit could contain the most useful, standardised functions we have to offer. For instance, this is dreaming big, but we could also have something like a PyGame or Node.js app that reads a .trees file and allows a user to interactively explore the tree sequence. Jerome has mentioned that SVG also supports interactivity; I don't know how the different options compare but my point is that a separate repo could be more free for experimentation.

Another visualisation is to read in case-control phenotype data and color branches of the tree sequence that contain cases vs. controls, I know @hyanwong was thinking about this.

Issue tskit-dev/tskit#389 which I just created describes the "canonical ordering" of samples idea in more detail, for those interested.

gtsambos commented 4 years ago

This sounds exciting and useful!

brianzhang01 commented 4 years ago

I came up with a visualisation idea at ProbGen and have put together an example here: https://github.com/brianzhang01/tskit-viz. The main ideas used in the visualisation are: 1) keyboard interactivity, which right now allows for moving along the genome, 2) a sorting order (#389 in progress) which causes adjacent trees to be more similar and makes for a clearer visualisation.

I'm happy to fold this into a visualisation repo under tskit-dev. It's also a good case-study for the discussion we're having here. Some points in particular:

  1. Current existing visualisation outputs are text (both Unicode and ASCII, with a priority on Unicode) and SVG. I'm not super familiar with SVG output, but my understanding is it's most useful within a Jupyter notebook. Since I don't always work in Jupyter, it's nice to have this PyGame command-line version, which is easier to spin up and gives me a big screen to see things in.

  2. My perception of the current drawing code is that it's already quite dense, and it may feel a bit hard to iterate on ideas there. A separate drawing repo wouldn't need such high software-engineering standards to start and might pull in more contributions.

  3. I think PyGame is a good framework because based on my research it's the recommended Python UI framework, which would attract development from msprime / tskit users. I can already think of a lot of extensions that could be added to this proof-of-concept, like showing GWAS with a phenotype, different clickable options in a sidebar, and animations.

jeromekelleher commented 4 years ago

The pygame thing is super cool @brianzhang01! I'm totally up for making this a tskit project, I think it's a great idea --- let's do it.

hyanwong commented 4 years ago

I like the idea of a tsviz repo. I also have some code for outputting 3D versions of a tree sequence (e.g. below), and I could add this. It's less useful for inspection of real tree sequences than @brianzhang01 's approach, but possibly useful for illustration/talks.

Re pygame, I find that on my OS X laptop running pygame 1.9.6 (Python 3.7.2) when I run python3 viz.py I get the correct command-line output, with genotypes etc, but the pygame window that pops up is entirely white, so something's up the the drawing code (CLI output below). It might be that it requires v2 of pygame?

Although I have no objections at all to pygame, I do think that SVG is a reasonable general output format, and I don't use it in Jupyter notebooks so much as use it for creating images in presentations, etc. The fact that it is viewable in a browser, and is editable are both useful properties too. I do note that you can animate SVG, so it might be possible to create something like a browser version of the pygame frontend using SVG too? Probably more hassle than it is worth, though.

Screenshot 2019-11-22 at 10 21 24

NB: output from my console when running viz.py - should I be worried that fontsize = None?

Yans-MacBook-Air:tskit-viz yan$ python3 viz.py 
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
Command-line args:
file: None
fontsize: None
length: 3000.0
num_samples: 20
seed: None
sort: 1
Your seed is 807
Max height: 61429.87127675946
Genome length: 3000.0
Navigate with the left / right arrow keys
Genotypes [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0] from mutation at 457.2085153055234
brianzhang01 commented 4 years ago

@jeromekelleher yeah sure! I think the first steps are setting up a separate repo, and you can think of what's the best code contribution guidelines / ways of soliciting ideas that would work well.

@hyanwong re the issue, fontsize = None is the default and it works for me. I'm using Mac OS X and pygame 2.0.0.dev6 (SDL 2.0.10, python 3.5.6).

I would recommend you take a look at this issue and see if it helps. It actually did take me a while to get a setup working, and I think I was actually running pygame 1.9.6 as well earlier before using some of the recommendations in this issue. The behaviour also seems quite similar to yours: "The windows created by pygame are blank." I think the difference between PyGame 1 and 2 is that PyGame 1 uses SDL 1 and PyGame 2 uses SDL 2. You can also try using python3 art.py in my repo to debug your setup, it's a very simple thing that draws a blue circle and some black lines.

That visualisation is really nice, I think I thought of something similar at one point and called it "train tracks" in my head.

jeromekelleher commented 4 years ago

I've created a new repo tsziv and a new viz team in tskit-dev, which as write access to it. You, me and Yan are members. I think we should treat it as a playground for various ideas initially, so with no particular structure or goal. We might develop it into something more formal later, if we want to.

Please fee free to copy your code in there, if you like! (You should have write access)

andrewkern commented 4 years ago

sounds super cool guys!

petrelharp commented 4 years ago

omg, this is wonderful.

molpopgen commented 4 years ago

@hyanwong -- I am probably going to borrow this graphic, with attribution of course.

hyanwong commented 4 years ago

@molpopgen please do. I'll upload the viz code for it later (it's in X3DOM).

jeromekelleher commented 4 years ago

I just transferred this thread from tskit. This repo would be the place to explore the leaf-traces viz. We should take any more general discussions off to a separate thread there in the tsviz repo if we want to continue them.