Closed szhan closed 1 year ago
Tracks of information can be added. For example, sites of disagreement between lshmm and BEAGLE and chip-like sites.
It may also be useful to visually compare sample paths. For example, an HMM path for the same sample obtained under different precision values when running lshmm.
Make the plots interactive using bokeh
.
Initially, I ran into a websocket error. Setting the environmental variable as follows solves the problem for me.
import os
os.environ["BOKEH_ALLOW_WS_ORIGIN"] = '0aaf0agotd3etfja916liv2etcl4ul9j3fk8kav1m1a16m18da6b'
It is from this thread https://github.com/bokeh/bokeh/issues/8096#issuecomment-406815954.
Managed to show and interact with a sample path. Next steps are to show (1) sites where discrepancies between imputed genotypes and true genotypes occur and (2) locations of chip-like markers.
Another developmental version.
https://github.com/szhan/tsimpute/assets/5580375/d5dd8f0b-89c8-42ea-8788-d83b6edcf9cc
Extend it to use information in a ref. panel tree sequence, e.g., relative node ages and sample status.
This differentiates sample nodes (black squares) and non-sample nodes (grey circles) in the copying path. I should modify them to be consistent with the tree displays in tskit
.
Not sure if it is a good idea to order parent nodes by time rather than id, so I'm thinking to add node times in the tooltips.
I think the following additional tracks could be useful for examining copying paths:
The simplest way to visualise all these tracks is to add them as separate plots below the main plot, as is done above. But is there a more elegant way to do this?
Another dev version. The tracks are mutable via an interactive legend.
I'm thinking adding a companion plot showing all the samples wrt their properties, such as number of wrongly imputed alleles and number of switches in its copying path. This plot will interact with the above plot to allow users to select a sample in the companion plot and display its path in the above plot.
Another idea is overlaying a sample path on top of the forward probability matrix, which is represented as a heatmap. This may not be useful when the number of nodes in the tree sequence is large, because parent nodes with similar likelihood values are not necessarily clustered together by node id. See a prototype below.
Accidentally closed this issue.
Add some plotting routines to help diagnose potential issues with sample paths. This is the working version.
Example output.