theislab / ehrapy

Electronic Health Record Analysis with Python.
https://ehrapy.readthedocs.io/
Apache License 2.0
232 stars 19 forks source link

Correlation plot for variable dependencies #745

Open mhaist94 opened 5 months ago

mhaist94 commented 5 months ago

Description of feature

Hi all,

one feature that might also help to uncover new biological insights within big heterogeneous datasets is to visualize the dependencies of variables investigated within the dataset. While you can explore those in UMAP space following initial clustering, it might be hard to spot all the associations therein. One way of showing those dependencies is of course using the simple cluster heatmap function. To add some statistics to it it would probably be useful also being able to conduct spearman-correlation analysis corrected for multiple testing and plotting those results in a correlation matrix (to show correlation factors) or if you wish to stress the dependencies rather than the strength of their correlation as a chord diagram. I.e. using this tool, you might be able to spot which factors correlate with your main endpoint and might thus be surrogates of the endpoint (or confounding factors that one could explore further in the detection and dealing with bias module). Hope this helps!