malariagen / ag1000g-phase1-vgsc-report

MIT License
1 stars 7 forks source link

Figure: haplotype structure #25

Closed alimanfoo closed 7 years ago

alimanfoo commented 7 years ago

Create a figure showing haplotype structure.

alimanfoo commented 7 years ago

@cclarkson here's some food for discussion tomorrow:

image

...this is a clustering of haplotypes carrying L995S. Only distinct haplotypes are used to construct the tree. Haplotype frequencies are shown in the adjacent bar plot. Shaded regions show the 5 largest clusters by a method I'll explain tomorrow - I'm thinking we show one network for each of these 5 clusters.

alimanfoo commented 7 years ago

Hey @cclarkson, maybe Python is an option after all, I made this with graphviz:

image

cc @hardingnj.

cclarkson commented 7 years ago

That's great, will be cool if we could keep most of it in python. Looking forward to seeing the code.

alimanfoo commented 7 years ago

Making some progress, here's a network with some real data, the Kenya/Uganda L995S cluster:

image

Will turn this into a usable function and share asap.

alimanfoo commented 7 years ago

Still not sure how to figure out which edge has which mutation, that may be tricky...

cclarkson commented 7 years ago

Hi @alimanfoo ,

Just to document the TCS comparison, almost all clusters produced identical networks to your function, except for 995F8, which TCS did not circularise and 995F16, which added a number of loops compared to the one you generated - see below (I tried to lay it out the same to compare but TCS kept crashing, this is best I managed, the bottom right hand corner is a bit messy!).

995f_cluster16_tcscomp

alimanfoo commented 7 years ago

Thanks Chris. Can you compare with the networks I made using median-joining? They should be in the (recently updated) hapclust_demo notebook, look out for the calls to graph_haplotype_network(..., network_method='mjn'). What you have above in TCS looks very similar to what I made with MJN...

image

cclarkson commented 7 years ago

@alimanfoo

Yep, I'm pretty sure that is the same as the TCS one, interesting.

ION - having a problem getting graphviz to work due to neato - how did you fix that problem?

cclarkson commented 7 years ago

No worries, think I've cracked it.

cclarkson commented 7 years ago

Hi @alimanfoo ,

Now I've got this all up and running again, I was just trying to plot multiple networks on one figure but I'm struggling. I can't see how to get it to use an 'ax', is this because it is just a wrapper for external code and can't talk to matplotlib?

alimanfoo commented 7 years ago

Hi Chris, sorry yes, I haven't figured out the best way to get the networks into a mpl figure. I'll have a play around...

On Thu, Mar 9, 2017 at 4:19 PM, Chris Clarkson notifications@github.com wrote:

Hi @alimanfoo https://github.com/alimanfoo ,

Now I've got this all up and running again, I was just trying to plot multiple networks on one figure but I'm struggling. I can't see how to get it to use an 'ax', is this because it is just a wrapper for external code and can't talk to matplotlib?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/malariagen/agam-vgsc-report/issues/25#issuecomment-285399532, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QjyTXWPUdlCPk95wE9ytk49Yuzurks5rkCZ_gaJpZM4MQYub .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

cclarkson commented 7 years ago

Hi @alimanfoo ,

I'm trying to find a nice way to bring the dendrogram together with the clusters but I'm struggling to work out how to give the fig_haplotypes_clustered function (from haplclust_utils) an ax to plot on in my composite figure because the dendrogram function is already a composite figure. Any pointers would be gratefully received!

Cheers, C.

alimanfoo commented 7 years ago

Hi Chris, I would suggest building each panel of the figure as a separate matplotlib figure, saving out to jpeg or png, then composing the whole figure in libreoffice draw, at least for now.

On Tuesday, April 11, 2017, Chris Clarkson notifications@github.com wrote:

Hi @alimanfoo https://github.com/alimanfoo ,

I'm trying to find a nice way to bring the dendrogram together with the clusters but I'm struggling to work out how to give the fig_haplotypes_clustered function (from haplclust_utils) an ax to plot on in my composite figure because the dendrogram function is already a composite figure. Any pointers would be gratefully received!

Cheers, C.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/malariagen/agam-vgsc-report/issues/25#issuecomment-293221001, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QgYwfaX7Z2JqA_q4dTDmXNZEVZTQks5ru1qIgaJpZM4MQYub .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

cclarkson commented 7 years ago

test_fig

Been flailing at this for a while and I think I am finally getting somewhere! I'm now trying to work out how to colour the network's backgrounds like the dendrogram (and align/scale things in a less awful way).

alimanfoo commented 7 years ago

Cool. If you think it isn't going to work to fit the tree and networks in the same figure, please feel free to rethink.

On Tue, Apr 11, 2017 at 6:13 PM, Chris Clarkson notifications@github.com wrote:

[image: test_fig] https://cloud.githubusercontent.com/assets/13287471/24921332/57756eda-1ee2-11e7-8c2b-fa4a4bc6662b.jpg

Been flailing at this for a while and I think I am finally getting somewhere! I'm now trying to work out how to colour the network's backgrounds like the dendrogram (and align/scale things in a less awful way).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/malariagen/agam-vgsc-report/issues/25#issuecomment-293333094, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QrAij8TQYoxCQAkYeHYATY28HGqYks5ru7TRgaJpZM4MQYub .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

cclarkson commented 7 years ago

As far as I can see, the importing of the graphviz image as an image array means that you can't overlay or 'draw' on top of those arrays. I'm going to try an kick out the graphviz stuff in a vector format so that I can build something a bit like this: inkscape_example_vgsc_netclustfig I think this figure could then be quite small, losing the fine network details, but we could have individual figures for the clusters of interest. What do you think?

alimanfoo commented 7 years ago

Hi Chris,

You should be able to plot over the top of an image if you want to. I just tried it, did imshow() then plot() to plot a line over the top, seemed to work. What did you want to overlay on the image?

Re the networks being small, that may be OK for some, I guess the key thing is to be able to see the topology, the colours of each node (including singletons), and labels for any non-synonymous edges.

I think F1 will need it's own figure though. But do some playing around and see what works.

On Wednesday, April 12, 2017, Chris Clarkson notifications@github.com wrote:

As far as I can see, the importing of the graphviz image as an image array means that you can't overlay or 'draw' on top of those arrays. I'm going to try an kick out the graphviz stuff in a vector format so that I can build something a bit like this: [image: inkscape_example_vgsc_netclustfig] https://cloud.githubusercontent.com/assets/13287471/24960804/d779e7d0-1f8e-11e7-86c9-6f0593736589.png I think this figure could then be quite small, losing the fine network details, but we could have individual figures for the clusters of interest. What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/malariagen/agam-vgsc-report/issues/25#issuecomment-293582947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qu9qpuiIwQOMZmaJZCCEvMcfFxrVks5rvNargaJpZM4MQYub .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

cclarkson commented 7 years ago

Hi @alimanfoo,

This was what I was trying to do with just code, but I resorted to a bit of Inkscape in the end. Looks okay, but we would need to make it clear that the network nodes are not to scale across networks, they are just relative to that cluster. Just going to make the stand alone for cluster 16.

vgsc_netclustfig

-Should I push on with this format for 995S? -What do you think about leaving the colony haplotypes in this analysis? -At some stage I saw the network function able to add the size of node (number of haps) to the figure, I'm trying to find this to make a size key, is it still there?

cclarkson commented 7 years ago

Hi @alimanfoo

Not sure how I feel about the overlapping nodes, but they can all be re-arranged nicely in Inkscape if we decide to use this type of figure...

995f_cluster_16_standalone_rededge

cclarkson commented 7 years ago

Figure 2 (draft) - haplogroup distance and frequency figure2_dendfreq_draft

alimanfoo commented 7 years ago

Looking good.

On Thursday, May 4, 2017, Chris Clarkson notifications@github.com wrote:

Figure 2 (draft) - haplogroup distance and frequency [image: figure2_dendfreq_draft] https://cloud.githubusercontent.com/assets/13287471/25711347/9ba8e1c0-30e6-11e7-8f74-e645bdb33155.jpg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/malariagen/agam-vgsc-report/issues/25#issuecomment-299220645, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QgxRsZeUea786MIXWs8WqzcMU1d0ks5r2e6WgaJpZM4MQYub .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

alimanfoo commented 7 years ago

Superseded by #69 and #73.