microbiome / miaViz

Microbiome Analysis Plotting and Visualization
https://microbiome.github.io/miaViz
Artistic License 2.0
10 stars 12 forks source link

Plot feature loadings function #135

Closed ElySeraidarian closed 3 months ago

ElySeraidarian commented 4 months ago

According to #92, we need a way to plot feature loadings after performing a reduction method by several ways. Here is a first version with PCA that plots feature loadings in 4 different ways : Tree, Barplot, Screeplot & Heatmap. This will be available for LDA & NMF as soon as it is possible to retrieve loadings matrix from reducedDim slot. There are parameters that can be added so that we can talk about it (for example the number of features plotted). The parameter method is probably doomed to disappear as we will be able to recognize which method has been used in the reducedDim slot. There are examples for NMF & LDA at the end for now without using the function. This will need examples in OMA and strong documentation in order to tell people how to use this function (agglomerate tree if plotting the tree, only plotting few features, ...). It is currently possible to give either loadings matrix or TSE object to plot.

I will add tests & generate documentation also later.

ElySeraidarian commented 4 months ago

Made some changes, I'll add a parameter to increase the number of principal components and maybe one optional parameter to select the name used for the matrix in reducedDim slot if it has been changed by the user? So that I'll be able to recognize the method used

ElySeraidarian commented 4 months ago

I have added support for more PC components (I have not added a parameter, it plots the number of components given by the matrix but maybe I should add if user wants to plot less?). Also added support if name is changed in reducedDim slot.

antagomir commented 4 months ago

Yes, it is necessary to give the number of components. Argument name "ncomponents" has been used elsewhere.

By default it could be ncomponents=5?

ElySeraidarian commented 4 months ago

Hadn't seen comments I'll check this

TuomasBorman commented 4 months ago

For heatmap, "shadow text" might be better

ElySeraidarian commented 4 months ago

Every change should have been done according to your comments, the values in tree layout is now accurately matching with labels (getting same results with heatmap and barplot). I'll focus on making it operational for every reduction method (detecting which method has been performed) with what @thpralas has done ( and maybe find a solution if user is not aware of these wrappers)

antagomir commented 4 months ago

Just for clarity, could you "resolve" all comments that have been addressed?

antagomir commented 4 months ago

Solution for non-wrapper users would be to support plain numeric matrices (without tree) in addition to (Tree)SE objects. If users want to include a tree then I think it is better to require that they create a TreeSE object.

ElySeraidarian commented 4 months ago

I have been able to make it work for LDA method with the wrapper

library(mia)
library(ggtree)
library(scater)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "clr", pseudocount = 1)
tse <- agglomerateByPrevalence(tse, rank="Phylum", update.tree = TRUE)
tse <- addLDA(tse)
plotLoadings(tse,  ncomponents = 2)

However I have not been able to autodetect method if reducedDim slot contains several methods, then user will have to specify name using dimred = "method_name" and I don't believe this is possible either?

antagomir commented 4 months ago

COuld you use reducedDimNames(tse) to detect?

TuomasBorman commented 4 months ago

For default layout, I would keep barplot as it is the most simple and fastest to look

TuomasBorman commented 4 months ago

I have been able to make it work for LDA method with the wrapper

library(mia)
library(ggtree)
library(scater)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "clr", pseudocount = 1)
tse <- agglomerateByPrevalence(tse, rank="Phylum", update.tree = TRUE)
tse <- addLDA(tse)
plotLoadings(tse,  ncomponents = 2)

However I have not been able to autodetect method if reducedDim slot contains several methods, then user will have to specify name using dimred = "method_name" and I don't believe this is possible either?

That is ok in my opinion. Other functions also assume that user specifies the reducedDim to plot.

For instance:

plotReducedDim(tse, dimred = "PCA") --> you have to always specify dimred in this function

TuomasBorman commented 4 months ago

I am also wondering if current heatmap layout takes full advantage of heatmaps. Heatmaps are create way to visualize large batch of data. However, in this layout it rather makes the plot more complex compared to simpler barplot. Instead, the heatmap could be in this format.

image

Current heatmap

image

Barplot with facets

image

TuomasBorman commented 4 months ago

After thinking this, we could have 2 layouts: "barplot" and "heatmap". The we could have hidden parameter "add.tree" that decides whether to add tree to heatmap (the "tree" layout is also one kind of heatmap)

The implementation could be then simplified a lot without losing features. This helps in the maintenance.

ElySeraidarian commented 3 months ago

Changes have been made, heatmap has been changed and parameter add.tree instead of tree layout

TuomasBorman commented 3 months ago

I simplified the code a lot. Check that it still works as expected. Also run BiocCheck::BiocCheck() and fix issues related to this added R file (especially, do not exceed 80 characters in line width). Also use inheritParam tag in the documentation.

TuomasBorman commented 3 months ago

Thing to discuss: should rank be row.var? --> variable defining a column from rowData. That is what it is doing currently

ElySeraidarian commented 3 months ago

Thing to discuss: should rank be row.var? --> variable defining a column from rowData. That is what it is doing currently

Yes I believe this would make more sense

antagomir commented 3 months ago

Seems good.

ElySeraidarian commented 3 months ago

But we could also still create a parameter rank that agglomerates tree as in the other functions

TuomasBorman commented 3 months ago

But we could also still create a parameter rank that agglomerates tree as in the other functions

If we agglomerate the data, we lose the match between loadings and rows. Loadings are for features of un-agglomerated data --> that is my first impression, check if that is correct

ElySeraidarian commented 3 months ago

But we could also still create a parameter rank that agglomerates tree as in the other functions

If we agglomerate the data, we lose the match between loadings and rows. Loadings are for features of un-agglomerated data --> that is my first impression, check if that is correct

Yes that's right, agglomeration has to be done before performing the reduction method

TuomasBorman commented 3 months ago

1) I removed authors. See the issue: https://github.com/microbiome/mia/issues/628 2) You should perform first agglomeration and then transform the data (fixed) 3) I added a check that the feature names match in tree heatmap

TuomasBorman commented 3 months ago

Looks very nice. Are you interested in updating OMA: https://microbiome.github.io/OMA/docs/devel/pages/beta_diversity.html#sec-dbrda-workflow

If I remember correctly, there were some error?

ElySeraidarian commented 3 months ago

Looks very nice. Are you interested in updating OMA: https://microbiome.github.io/OMA/docs/devel/pages/beta_diversity.html#sec-dbrda-workflow

If I remember correctly, there were some error?

I will update using this function, I found same results when plotting. Also this example shows how to retrieve the loadings (for now we can plot only by getting matrix and then giving the matrix to plot).