microbiome / miaViz

Microbiome Analysis Plotting and Visualization
https://microbiome.github.io/miaViz
Artistic License 2.0
10 stars 12 forks source link

Explained Variance #21

Closed ChouaibB closed 3 years ago

ChouaibB commented 3 years ago

In several ordination (or dimensionality reduction) methods, explained variance is needed to be calculated and reported. TreeSE object stores reduced dimensionality data, along with the data related to the ordination method itself (e.g. eigen values for PCoA or PCA). Could a method/function for calculating the explained variance for k dimensions be handy among the miaViz package?

antagomir commented 3 years ago

This is a good point, there is already some discussion in OMA beta diversity section current lines 213-221.

Sometimes a similar measure is shown for MDS/PCoA. The interpretation is generally different, however, and hence we do not recommend using it. PCA is a special case of PCoA with Euclidean distances. With non-Euclidean dissimilarities PCoA uses a trick where the pointwise dissimilarities are first cast into similarities a Euclidean space (with some information loss i.e. stress) and then projected to the maximal variance axes. In this case, the maximal variance axes do not directly reflect the correspondence of the projected distances and original distances, as they do for PCA.

The explained variance is generally used only for PCA and other eigenvalue-based ordination methods (most notably, MDS aka PCoA) as far as I have seen. For PCoA, the interpretation is indeed somewhat different from PCA, and measuring the stress (information loss) instead might be more appropriate; although stress concerns the whole ordination rather than specific axes.

On the other hand, many researchers still use PCoA explained variances and may not be even aware of the differences. We have at least shown in OMA how to calculate this for PCoA, and described the interpretation problem.

Because this is widely used, it could make sense to provide it but I am not sure if we want to support the use of this option as a best practice that is easily and readily available through a function, rather than through an explicit programming example. We can discuss this here.

If this is provided as a function, then it could go to miaViz and probably good to calculate for all axes at one go. Such function should return the stress as well, and this can be combined with the Shepard plot (see OMA).

FelixErnst commented 3 years ago

Since this is usually given with the axis caption, it can be added very easily later on. As Leo has mentioned the mathematical basis for this is sketchy at best and therefore I would avoid any functionality claiming anything different.

I would close this issue. OK?

antagomir commented 3 years ago

Ok to me.