labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 29 forks source link

Hierarchical Clustering #81

Closed seajane closed 2 months ago

seajane commented 2 years ago

Is it possible to view the hierarchical clustering tree that is created when the tile plots are made? It would be useful to see how some of the groups branched.

axbazin commented 2 years ago

Hi,

My apologies for the delayed response, I messed up my github config and was not receiving notifications anymore.

It is currently impossible to view the results of the clustering, though maybe it is possible to add that feature to the command eventually. maybe it is better to do an actual phylogeny instead ? You may not get the same clustering, but the hierarchical clustering being based only on the presence/absence of gene families, it is not very reliable and does not really replace building an actual phylogeny, if you wish to see how your different genomes are related to each other.

Adelme

seajane commented 2 years ago

Thanks, we already have trees based on traditional phylogeny. The presence/absence tree revealed some unique grouping that correlates to another categorical grouping of these strains and so was very interesting in itself as well as showing differential genes are present. The distance and strength of this association would be really amazing to have access to.

axbazin commented 2 years ago

Alright I see ! I guess it is something that could be added in the futur.

What is done in this is basically compute jaccard similarities between vectors of presence absence of gene families for each genome, then make a dendrogram based on those similarities. The function that is used for this can actually output a plot, so having as optional additional output both the matrix and the tree of the clustering wouldn't be too difficult to obtain, I think.

cmonat commented 1 year ago

Hello,

I'm also interested to get this tree, how is it possible? Thanks a lot

C.

ggautreau commented 1 year ago

Hi Cécile,

It seems to be possible to integrate dendrograms next to an heatmap ( https://plotly.com/python/dendrogram/ ) using Plot.ly so I will test if this could be added to PPanGGOLiN.

Au plaisir :)

JeanMainguy commented 2 months ago

Hi, The tile plot has been improved in version 2.1.2. Now, the dendrogram can be added to the plot with the argument --add_dendrogram.

Check out the updated tile plot documentation here: https://ppanggolin.readthedocs.io/en/latest/user/PangenomeAnalyses/pangenomeAnalyses.html#tile-plot As well as what has been improved/added to the tile plot in the PR description #277