agglomeration functions and na.rm

TuomasBorman commented 1 week ago

In this issue https://github.com/microbiome/mia/issues/654, we noticed that the use of na.rm is not intuitive in agglomeration functions.

This PR adds option for excluding NAs in agglomeration functions (agglomerateByRank, agglomerateByVariable and agglomerateByPrevalence). na.rm now works similarly to sum(x, na.rm = TRUE) in these aforementioned functions.

In agglomerateByVariable na.rm controlled whether we remove those rows that did not have any group information, i.e., their group was NA. Now this parameter is named group.rm.

In agglomerateByRank na.rm controlled whether we remove those rows that did not have any taxonomy information. Now this parameter is called empty.rows.rm.

These parameter names can be still modified. Maybe we could unify so that both would have parameter empty.row.rm.

na.rm is now handled in mia, but I opened issue in scuttle if this can be handled directly in function that mia utilizes: https://github.com/LTLA/scuttle/issues/33

Let's wait until the issue in scuttle is solved.

antagomir commented 1 week ago

How about empty.group.rm or (group.na.rm and row.na.rm) to enhance consistency?

TuomasBorman commented 1 week ago

row.rm might also be one option to keep it short. I think we could use same name for both

antagomir commented 1 week ago

ok to me as long as it is consistent.

TuomasBorman commented 1 week ago

The parameter name for removing NA rows is empty.rm

TuomasBorman commented 1 week ago

After discussing in scuttle package, it was decided that na.rm will not be implemented in scuttle. This is why it is implemented in mia. mia uses scuttle function by default. If na.rm=TRUE and there are NAs, mia uses own implementation

microbiome / mia

agglomeration functions and na.rm #658