merge and cluster: separate or unified functions for rows / cols?

antagomir commented 1 year ago

We have now:

Clustering in a single function, with the MARGIN argument:

cluster(... , MARGIN="features")
cluster(... , MARGIN="samples")

-> This function returns a \code{SummarizedExperiment} with clustering information in its colData or rowData

Merge in two different functions:

mergeFeatures(...)
mergeSamples(...)

-> This function returns a merged (Tree)SE object.

Would seem logical to unify the treatment, to have also merge() implemented with the MARGIN argument.

One issue with this is that the rows have the additional sequence data slot that the columns do not have. Therefore the treatment of rows (features) requires an extra step, and the problem is not entirely symmetric.

A wrapper can certainly deal with this but it highlights the more fundamental point, whether we like to enforce and maintain separate merge functions for samples and features. I do not see an immediate need for that as the merging procedure is near-identical, both rows and cols even have the (optional) tree information.

TuomasBorman commented 1 year ago

I agree with this. We could have merge(MARGIN = "features") function; inside the function MARGIN could specify whether to run .merge_features or .merge_samples internal functions

antagomir commented 3 months ago

This can also be closed?

TuomasBorman commented 3 months ago

Yep

microbiome / mia

merge and cluster: separate or unified functions for rows / cols? #401