refinery-platform / heatmap-scatter-dash

Interactive visualizations for differential expression
MIT License
25 stars 1 forks source link

PCA x metadata heatmap #138

Open mccalluc opened 6 years ago

mccalluc commented 6 years ago

Once we have metadata, something that would be useful would be a heatmap of principle components by metadata: The idea is to get some idea of what are the underlying attributes which cause clusters. (Thinking this would be another tab in the top right area.)

mccalluc commented 6 years ago

me:

heatmap: There's the main one, and I need to work on that, but one of you also mentioned another heatmap where one axis is the principle components, and the other axis is metadata fields, and from that you get a sense of what characteristics are reflected in each of the PCs. I think this second one is just the result of matrix multiplication, but I wanted to confirm.

john:

Oh, yes. That one is actaully a bit more complicated, the plot is a visualization of the output of the method used in this paper: https://www.ncbi.nlm.nih.gov/pubmed/28350385 . It might be best to file that for later and talk to Lorena in our group (cc:’d) about how she implemented it in R (in her R package DEGreport in the DEGcovariates function).

ie, it's not just multiplying the matrices. Going to take this out of the milestone and come back to it latter when requirements are more clear.

lpantano commented 6 years ago

Hi,

Here you can find an example of the heatmap: http://lpantano.github.io/DEGreport/reference/degCovariates.html

You have two inputs, expression matrix and metadata. PCA is calculated from the expression matrix, and the PCs values associated to each sample are obtained from that. Then, these values are correlated to columns in the metadata (here there is correlation value and an padjusted pvalue). The colors in the heatmap represent the correlation value between each column and each PCs from the PCA. Non-significant correlations are shown in grey (NA) in the heatmap.

Additionally, a dendrogram can be added for the metadata columns, that would indicate the correlation between the metadata variables. Basically, a matrix correlation is created from pairwise comparison between each column in the metadata. With that, the dendrogram is generated using some clustering algorithm and is added to the figure. (In this case, the order of the columns in the heatmap need to match the order of the dendrogram)

Let me know if you need more info.