microbiome / OMA

Orchestrating Microbiome Analysis
https://microbiome.github.io/OMA
81 stars 43 forks source link

neatmap / neatsort #146

Open antagomir opened 2 years ago

antagomir commented 2 years ago

Moving here miaViz issue #40 Row/col sorting based on angle on MDS projection is implemented in SEtools R package.

The vignette says "By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering."

Add an example on how to use this to construct so-called neatmaps.

Cite the neatmap publication as well.

antagomir commented 5 months ago

This could go, for instance, in ch. 15 (visualization) / heatmaps.

RiboRings commented 5 months ago

After some reading, It seems that sechm does order rows from the angle of the MDS plot by default (just like NeatMap, which is no longer an independent package and it works under the hood of sechm).

So OMA chapter 15.3 is already showing how to do that. According to this sechm tutorial section 2.2, It is also possible to change the ordering criterion back to standard hierarchical clustering, which could be added to OMA. However, I personally find ComplexHeatmap good enough for hierarchical clustering.

What are your thoughts?

antagomir commented 5 months ago

I agree that sechm is good for neatmaps and ComplexHeatmap for hclust.

In fact ch 15.3 could be simplified in this regard.

antagomir commented 3 months ago

This topic is now under construction in miaViz PR https://github.com/microbiome/miaViz/pull/128 - in fact I forgot we had this issue.. Let us finalize this OMA issue after the miaViz PR is complete.

@SHillman836 I gather two points from above:

1) Row/col sorting based on angle on MDS projection is implemented in SEtools R package. Our case is more general, miaViz::getNeatOrder can use PCA, MDS, or any other ordination method. I would cite this SEtools method in the documentation of miaViz::getNeatOrder just for the record. It would be good to make a quick comparison to see if there are any notable differences in performance/outcomes or speed.

2) We could consider contributing the expanded functionality (miaViz::getNeatOrder) to the SEtools package instead (from miaViz). It could fit there naturally (perhaps even more naturally than to miaViz) and support collaborative ecosystem development. Downside is that we might need to comply with the naming conventions in that package.

3) The sechm package (see above) supports SE objects and it has a similar radial angle method already implemented. I still think it is useful to have our own method because sechm does this directly as part of heatmap plotting but we have separated these functions and that has added value. In any case it would be useful to mention in the getNeatOrder function documentation (the @details slot or or in the function @example comments) that the sechm implements a similar method and includes support for SE. In OMA, we could show both, or just the miaViz implementation (sechm has other uses that will be demonstrated anyway). It would be good to have a brief look how these compare: do we get similar heatmaps with both miaViz::getNeatOrder and sechm?

SHillman836 commented 2 months ago

OK thanks - I had a few follow up questions

1) To your first point, where exactly is this SETools method? The SETools docs are pretty limited - https://www.bioconductor.org/packages/release/bioc/vignettes/SEtools/inst/doc/SEtools.html - and I might be wrong but I thought SETools got amalgamated into sechm.... and that MDS sorting method was only in sechm

2) So I've done some testing of how the sechm sorted heatmaps differ from the getNeatOrder sorted heatmaps.

When you perform an MDS on the z-transformed assay data, then order with the getNeatOrder function, then plot the ordered original z-transformed data, with the scaling parameter in sechm as FALSE, you get this -

Screenshot 2024-06-18 at 13 37 25

Then when you sort via sechm directly, passing in the relative abundance data, using the sechm scaling parameter, you get this -

Screenshot 2024-06-18 at 13 38 05

So there is some difference.

antagomir commented 2 months ago

1) If neatsorting is not available in SEtools then we can forget about that

2) Could you show the code that does this?

SHillman836 commented 2 months ago

Code for heatmap 1, with MDS is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

tse <- transformAssay(tse, assay.type = "counts", method = "relabundance", MARGIN = "samples", name = "relabundance")

tse <- transformAssay(tse, assay.type = "relabundance", method = "z", MARGIN = "features", name = "z")

tse <- runMDS(tse,
              FUN = vegan::vegdist,
              method = "bray",
              assay.type = "z",
              name = "MDS_bray")

mds_coords <- reducedDim(tse, "MDS_bray")

sorted_order <- getNeatOrder(mds_coords, dimensions = c(1, 2), centering.method = "mean")
tse <- tse[, sorted_order]

features <- rownames(assay(tse, "z"))
sechm_plot <- sechm(tse, assayName = "z", features=features, do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

print(sechm_plot_custom_sorting)

Code for heatmap 2, direct via sechm is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

tse <- transformAssay(tse, assay.type = "counts", method = "relabundance", MARGIN = "samples", name = "relabundance")

tse <- transformAssay(tse, assay.type = "relabundance", method = "z", MARGIN = "features", name = "z")

features <- rownames(assay(tse, "relabundance"))
sechm_plot <- sechm(tse, assayName = "relabundance", features = features, do.scale = TRUE, sortRowsOn = NULL, cluster_rows = FALSE)

print(sechm_plot)
SHillman836 commented 2 months ago

(Though I'm adding your changes over params and method calling now)

antagomir commented 2 months ago

For final examples in the R documentation it will better to have just once the parts in data preparation that are common for both methods (agglom, transf).

Code 1: I would simplify this:

reducedDim(tse, "MDS_bray") <- runMDS(tse,
              FUN = vegan::vegdist,
              method = "bray",
              assay.type = "z",
              name = "MDS_bray")

ord <- getNeatOrder(reducedDim(tse, "MDS_bray"), centering.method = "mean")

sechm_plot <- sechm(tse[, ord], assayName = "z", features=rownames(tse), do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

Some comments could be added to explain the code.

TuomasBorman commented 2 months ago

Use calculateMDS https://rdrr.io/bioc/scater/man/runMDS.html

antagomir commented 2 months ago

Usually we would recommend the following transformations for this kind of heatmap:

The reason: clr will logarithmize the data (-> becomes more normally distributed -> nice for symmetric visualization) and aims to remove the compositionality bias; then scaling on features will put center on zero and scale variance to unit for all features -> more direct visual comparability

-> Check how the plots look after this.

SHillman836 commented 2 months ago

Ah OK - just to clarify sorry should've made it clearer, so that code wasn't code to be put into the documentation or published anywhere, that was just me exploring differences between sechm and getNeatOrder heatmaps. Which is why it's abit messy and there are no comments - it was just to look at differences in heatmaps. In fact, my code to be put into the documentation uses PCA not MDS - the reason I had to do MDS for this was cause sechm only creates MDS heatmaps.

Ok yes, noted about clr. will retest the plots with this. Would you use clr not rel-abundance for PCAs too? Yeah it's not pushed yet but my updated miaViz method example uses scale for the transformAssay method

antagomir commented 2 months ago

Yep I realized afterwards. I think those transformation updates might help

antagomir commented 2 months ago

CLR can be done for both counts or relabundances.

If you use CLR + Euclidean distance, then PCA would be the common option. Otherwise, relabundance + Bray-Curtis -> MDS / NMDS

SHillman836 commented 2 months ago

Ah ok, thanks.

So for PCA -

clr for samples -> z-transformation for features-> Euclidean distance/PCA

Then for MDS -

relabundances -> clr for samples -> z transformations for features -> bray-curtis -> MDS

Let me know if I'm missing something

antagomir commented 2 months ago

yes ok - although clr you can do directly on counts as well, I think it shouldnt drastically change the outcome (or ideally, you could check that). If it doesn't change the results then I would prefer the simpler sequence (counts -> clr -> z -> ...

SHillman836 commented 2 months ago

OK so retested and there are definitely some differences.

So when ordering and plotting with sechm you get this heatmap -

Screenshot 2024-06-20 at 13 31 33

The code for this is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

# Plot with sechm
features <- rownames(assay(tse, "z"))
sechm_plot <- sechm(tse, assayName = "z", features = features, do.scale = TRUE, sortRowsOn = NULL, cluster_rows = FALSE)

## Display the plot
print(sechm_plot)

Although, when you create the MDS with mia, apply the getNeatOrder method, then plot that with sechm you get something quite different

Screenshot 2024-06-20 at 13 35 30

The code for this is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Run MDS on the dataset with Bray-Curtis distances using relative abundance data
reducedDim(tse, "MDS_bray") <- calculateMDS(tse,
                           FUN = vegan::vegdist,
                           method = "bray",
                           assay.type = "z")

## Sort by radial theta using MDS coordinates
sorted_order <- getNeatOrder(reducedDim(tse, "MDS_bray")[, c(1,2)], centering = "mean")
tse <- tse[, sorted_order]

## Create the heatmap with sechm whilst retaining this radial theta ordering
sechm_plot <- sechm(tse, assayName = "z", features=rownames(assay(tse, "z")), do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

## Display the plot
print(sechm_plot_custom_sorting)
antagomir commented 2 months ago

Any chance to troubleshoot where the difference appears exactly? is it the plotting command or some of the preceding steps (ordination, sorting, color palette..)

antagomir commented 1 month ago

@SHillman836 is this issue/PR ready or something still pending?

SHillman836 commented 1 month ago

@antagomir ah sorry I actually completely forgot about this one! My bad, just got so caught up in all the other stuff. Will add this to the to do list for this week.

antagomir commented 1 month ago

Great! Let's make sure that there are no near-finished pending tasks..!

SHillman836 commented 1 month ago
Screenshot 2024-07-26 at 14 04 32

Image on the left is sorting with sechm directly -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Verify the assay data before plotting
test_unsorted <- assay(tse, "z")
View(test_unsorted)

# Plot with sechm using its default MDS angle sorting
sechm_plot_mds_angle_sorting <- sechm(tse, assayName = "z", features = rownames(assay(tse, "z")), 
                                      do.scale = FALSE, cluster_rows = FALSE, breaks = 0.985)

## Display the plot
print(sechm_plot_mds_angle_sorting)

image on right is sorting with getneatorder -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Run MDS on the dataset with Bray-Curtis distances using relative abundance data
reducedDim(tse, "MDS_bray") <- calculateMDS(tse, FUN = vegan::vegdist, method = "bray", assay.type = "z")

## Sort by radial theta using MDS coordinates
sorted_order <- getNeatOrder(reducedDim(tse, "MDS_bray")[, c(1, 2)], centering = "mean")
tse <- tse[, sorted_order]

## Verify sorted assay data
test_sorted <- assay(tse, "z")
View(test_sorted)

## Create the heatmap with sechm whilst retaining this radial theta ordering
sechm_plot_custom_sorting <- sechm(tse, assayName = "z", features = rownames(assay(tse, "z")), cluster_rows = FALSE, 
                                   do.scale = FALSE, sortRowsOn = NULL, breaks = 0.985)

## Display the plot
print(sechm_plot_custom_sorting)

it's the sortRowsOn parameter that makes the difference. And it's because sechms MDS sorting affects rows. They use cols to alter the row order. Whereas the getNeatOrder miaViz method sorts samples, not rows. so when you apply this and disable the sechm sorting, you get something different. Sechm MDS sorts rows. getNeatOrder MDS sorts cols.

antagomir commented 1 month ago

It should be possible to enable sorting both rows and columns, and the user should be able to decide if they sort neither, one, or both.

SHillman836 commented 1 month ago

In terms of the user deciding for what - this issue was just about differences between sechm and neatOrder. With neatOrder you can just transpose the matrix to sort the rows I guess. But the neat order method has been merged in, and just takes in ordinated data to sort. It's very specific. Then in terms of sechm - you can't sort cols, only rows.

antagomir commented 1 month ago

Great - a quickly checked the code and noticed the following at least.

1) Your example code for getNearOrder is using "bray" method for dissimilarity calculations. This method does not work for negative values, so I am not sure how it can be applied for z-transformed values, also a warning is thrown.

2) The sechm package calls seriate::get_order for the sorting get_order(seriate(dist(y), method=method)) on line 53 here. And this uses base R function dist with default value, the default is euclidean distance. Therefore your example code seems to rely on euclidean distance in the sechm example and bray dissimilarity on the getNeatSort example. This is expected to cause notable differences.

3) If we just want to compare these two methods, then can we get the comparable ordering if we just run getNeatSort for the transposed data? Then it should we an advantage of getNeatSort that user can choose to run it for rows, cols, or both, whereas in sechm only rows are available?

4) However.. sechm is calling the seriation CRAN package which provides many different linear sorting methods, not just MDS. Noticing this, we might like to use this directly because there is larger variety of optimized and actively maintained methods for the same purpose.

Therefore my suggestion for this issue is that we should experiment a bit with seriation and see if that provides the simple solution for our heatmap sorting needs. If yes, then I would remove getNeatOrder function from mia (and apologize for the extra hassle with that) and add examples on heatmap sorting with seriation in OMA.