microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
45 stars 25 forks source link

Rename MARGIN #578

Closed TuomasBorman closed 1 day ago

TuomasBorman commented 3 weeks ago

As discussed in here https://github.com/microbiome/mia/pull/562 and here https://github.com/microbiome/mia/pull/556, our MARGIN parameter does not follow the common meaning in base R.

In base R, the MARGIN = 2 specifies that the operation is performed across each column, ensuring that the dimension where the operation is applied remains unchanged. However, in our methods, the dimension of the output can change.

> library(mia)
> data("GlobalPatterns")
> tse <- GlobalPatterns
> dim(tse)
[1] 19216    26
> temp <- apply(assay(tse), 2, sum)
> length(temp)
[1] 26
> temp <- agglomerateByVariable(tse, MARGIN = 2, f = "SampleType")
> dim(temp)
[1] 19216     9

This can be misleading if MARGIN has multiple meanings. However, changing the parameter value to the opposite (for example in the case of agglomerateByVariable) can also be unclear since the data container and functions are more complex (from where to find the variable that is used in merging?).

It is beneficial that we use common parameters in our system. That is why we should change the name of MARGIN to by.

MARGIN is used in several functions. Some of them are already in release (must be deprecated) and some are only in devel (can be removed without deprecation).

The usage could be look like this:


library(mia)

data("GlobalPatterns")
tse <- GlobalPatterns

# split by columns
temp <- splitOn(tse, f = "SampleType", by = "cols")

# Agglomerate by columns
temp <- agglomerateByVariable(tse, f = "SampleType", by = "columns")

# Cluster by columns
library(bluster)
tse <- addCluster(tse, by = "samples", HclustParam(metric = "bray", dist.fun = vegan::vegdist))

# transform by rows
tse <- transformAssay(tse, method = "z", by = "rows")

# correlate by rows
temp <- getCrossAssociation(tse[1:10, ], tse[1:10], by = 1)

The documentation should use values "cols" and "rows" but we could still support 1, 2, columns... etc. hiddenly since we have the util function .check_MARGIN

-Tuomas

TuomasBorman commented 2 weeks ago

As MARGIN is used in vegan::decostand, we have to still think about what to do with transformAssay and its MARGIN parameter, It does not violate the meaning of MARGIN in base R, so I think we could leave MARGIN in transformAssay as it is currently