Closed Daenarys8 closed 1 year ago
Status of this?
Status of this?
Complete for now.
Great!
We had elsewhere discussion about function naming.
I would suggest to update as follows:
Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)
Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?
Great!
We had elsewhere discussion about function naming.
I would suggest to update as follows:
* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392)) * agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)
Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)
Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?
I also think mergeFeatures
and mergeFeaturesByRank
are better than agglomerate*
, and it is good to have same naming convention.
We discussed also about enabling rowData
variables in mergeRows
. Currently, the groups must be fed as a vector (f
parameter) but I think it would be nice to be able to specify the grouping variable directly from rowData
as a column name. (Same for mergeCols)
There must be some reason why we didn't implement this before??? Do you remember @antagomir?
I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?
I agree. I don't think there is a specific reason, the rowData variables could be called by name as well.
If @Daenarys8 can open a new issue and PR about that it would be great.
You can close this issue when you have checked that it is ready.
Great! We had elsewhere discussion about function naming. I would suggest to update as follows:
* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392)) * agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)
Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames) Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?
I also think
mergeFeatures
andmergeFeaturesByRank
are better thanagglomerate*
, and it is good to have same naming convention.We discussed also about enabling
rowData
variables inmergeRows
. Currently, the groups must be fed as a vector (f
parameter) but I think it would be nice to be able to specify the grouping variable directly fromrowData
as a column name. (Same for mergeCols)There must be some reason why we didn't implement this before??? Do you remember @antagomir?
I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?
Sure, will give it a shot
Can someone close this PR when it is confirmed to be ready.
Was this ever merged? If I check correctly from above, it was just closed without merging ? The idea was to merge I guess?
Hmm - - ok now I noticed that the renaming scheme discussed above, and agreed, has not made it to this PR yet.
Can we add it (@Daenarys8) ?
More specifically, the idea was to rename "Rows" to "Features" and "Cols" to "Samples".
In addition, to harmonize terminology (use "merge" instead of "agglomerate").
However, one last thing to discuss first: the meaning of "to agglomerate" is somewhat better fit with our case (in terms of language & meaning), in particular when the phylogenetic tree is involved in the process. The phyloseq equivalent to TreeSE agglomerateByRank
is tax_glom
(e.g. https://mikemc.github.io/speedyseq/reference/tax_glom.html).
We could use "glom" instead of "merge". That would be as fast to write, and the meaning would be more specific (merge is easier to confuse with merging of matrices). However, "glom" might be a bit weird term to introduce right now, and it is possible to re-evaluate that later as well.
Hence, in summary, I suggest to rename as:
@Daenarys8 could you add that to this PR as discussed above, OR open a new issue proposing this change, then merging and closing the current PR. Also confirm that the other points discussed / agreed above have now been addessred.
Ok it is more clear to do the other discussed changes in a separate PR. I am merging and closing this one.
Ok - now as this is merged:
the original motivation was ANCOMBC issue #174 where we would like to let users specify taxonomic rank or also other rowData variable to merge rows before running ANCOMBC: https://github.com/FrederickHuangLin/ANCOMBC/issues/174
Now, ideally this new wrapper will help there; it would group by taxonomic rank if this is available in rowData, otherwise it uses the more general grouping.
There are 2 different methods doing similar (merging/grouping rows/features);