Open TuomasBorman opened 7 months ago
I agree with all suggestions here.
addTaxonomyTree()
and taxonomyTree()
are calculating tree based on taxonomy info provided in rowData
. This is misleading users because the calculated tree is not really a taxonomy tree. Taxonomy tree refers usually phylogeny / phylogenetic tree that is calculated based on genetic distances.
The calculated tree takes into account only "hierarchy distances" without genetic distances "species A is x edges away from species B". --> So the calculated tree is "hierarchy tree"
I propose that we rename taxonomyTree()
to getTree()
(highlights the returned value which is the tree itself) and addTaxonomyTree()
to addTree()
.
@antagomir @thpralas
Even that could be confused with phylogenetic tree.. how about getHierarchy()
or mapRanksToTree()
or makeTreeFromRanks()
?
If the tree itself is formally similar to phylogenetic trees, do we need a separate "add tree" function for this? Can we just use the same procedure than with phylogenetic trees. Then there would be only one additional function, which is "get tree". Users would then need to add this separately. This is an extra step but perhaps it would help some users better understand the procedure?
I would go then with getHierarchyTree
--> explains quite well what is happening
I think add*
is quite convenient and we should not remove it. Rather we should focus on documentation which is not too clear. See for example reference page https://microbiome.github.io/mia/reference/index.html It could be much more clear especially when we don't even have that many functions.
I think we could move addHierarchyTree under separate page of manual (not under taxonomy). In the (excel) table that I sent you, there was some ideas how to group these functions more clearly.
In the documetation, we could refer this as hierarchy tree and not taxonomy tree. The possibility for misunderstanding is not that big then anymore
Ok
I made a commit but I forgot referencing this issue in the commit name : 1e0b96fd4b8970ace3085651dc5155826c51d232
We have to think about basic principles of functions.
TreeSE
more --> returnTreeSE
if possiblerunCorrelation()
...).1. cols() & rows() and related functions
We have some functions that are doing some operation to row- or column-wise. One of these is
mergeCols()
/mergeRows()
function. I think we should combine these functions and addMARGIN
parameter like we have done incluster()
function. -->merge(MAGIN = "rows")
Moreover, we have some functions that are doing similar things but slightly differently (
mergeRows()
vsmergeRowsByRanks()
). I think we could combine these function by addingby.rank
parameter. -->agglomerate(MARGIN = "rows", by.rank = TRUE)
These affect also
splitOn()
/unsplitOn()
andsplitByRanks()
functions -->split()
2.
We have multiple aliases for
transformAssay()
function. Why don't we have justtransformAssay()
(or maybetransform()
) and unexport all aliases. It is more unclear when we have lots of aliases.2. Return TreeSE
Some functions (such as
getExperimentCrossAssociation()
) does not returnTreeSE
but only the table or list of matrices. We could consider if it is better to always returnTreeSE
or if it makes things more complicated to do so. Results that cannot be stored toreducedDIm
,colData
orrowData
(or other commonly used slot) can be stored tometadata
.Good things:
TreeSE
Negative thing:
metadata
might complicate things because the function can be used to find rows to subset.3. Function naming
I think the function naming can be improved. We could use key words for tasks
TreeSE
:runAlpha
,runPCA
...TreeSE
);calculatePCA
,calculateRDA
getTaxonomyTree
addTaxonomyTree
TreeSE
:importDADA2
makeSomething or convertTo --> convert
TreeSE
to some other object:makePhyloseq
Also, we can stick with rows and cols in function naming.
getRareRows()
for instance (or perhapsgetRare(MARGIN = "rows")
).4. Make feature-related functions support columns
For example,
addTaxonomyTree()
adds tree only to rows. However, since columns can have also tree, we could add support for columns. We might have to consider ranks for rows and columns separately:TAXONOMY_RANKS
-->ROW_RANKS
&COL_RANKS
These are things that we might want to discuss before we finalize miaverse, After formal publication, I think it is not good that we change names and these conventions. This is quite free thoughts but hopefully initialize some discussion from where we can start.