Agglomeration methods - Githubissues

antagomir commented 1 year ago

Consider feature / sample agglomeration methods as useful dimension reduction methods.

First issue to consider is whether it is necessary to have separate functions for Features vs. Samples, or could there by just one merge function with a margin argument. In the latter case we could drop "Features" out from the function names.

These are closely related to:

mergeFeatures
mergeFeaturesByRank

Then we could have added:

mergeFeaturesByCluster (based on cluster() function, based on co-abundances)
mergeFeaturesBySimilarity (similar to speedyseq::tip_glom; agglomerate tree leaves that have a small (user-defined) distance in the tree; also based on hierarhical clustering / co-abundance?; to check how this would differ from byCluster and byTree..)
mergeFeaturesByTree (similar to speedyseq::tree_glom; agglomerate tree leaves that have a small (user-defined) distance in the tree; based on the tree only)

See https://rdrr.io/github/mikemc/speedyseq/man/tree_glom.html

Naming could follow the one suggested in #392 . Note that the TreeSummarizedExperiment package sometimes refers to the finest level features as (tree) leaves. Consider the naming on the same go.

TuomasBorman commented 3 months ago

From those 3 suggestions, "mergeFeaturesByCluster" is already supported in a level that we want to support this "standardization", i.e., creating wrapper for this might not give extra value.

tse <- addCluster(tse, ...)
tse <- agglomerateByVariable(tse, by = "rows", f = "cluster")

antagomir commented 3 months ago

The others, no immediate need now I assume.

microbiome / mia

Agglomeration methods #399