microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
45 stars 25 forks source link

fix tree agglomeration in agglomerateByPrevalence #575

Closed thpralas closed 2 days ago

thpralas commented 3 weeks ago

As mentioned in https://github.com/microbiome/mia/issues/573, agglomerateByPrevalence returns a TSE object with rowTree = NULL.

This PR removes the class checking and class conversion from TSE to SCE that removes the tree.

With this modification, the tree is returned and agglomerated if agglomerate.tree = TRUE.

data("GlobalPatterns")
tse <- GlobalPatterns
tse <- agglomerateByPrevalence(tse, rank = "Phylum", agglomerate.tree = FALSE)
rowTree(tse)

Phylogenetic tree with 19216 tips and 19215 internal nodes. Tip labels: 549322, 522457, 951, 244423, 586076, 246140, ... Node labels: , 0.858.4, 1.000.154, 0.764.3, 0.995.2, 1.000.2, ... Rooted; includes branch lengths.

data("GlobalPatterns")
tse <- GlobalPatterns
tse <- agglomerateByPrevalence(tse, rank = "Phylum", agglomerate.tree = TRUE)
rowTree(tse)

Phylogenetic tree with 67 tips and 66 internal nodes. Tip labels: 549322, 173903, 368907, 51888, 591769, 249529, ... Node labels: , 0.858.4, 1.000.21479, 0.946.619, 0.953.588, 0.940.473, ... Rooted; includes branch lengths.

TuomasBorman commented 3 weeks ago

Also add input check for agglomerate.tree. Fail fast.

TuomasBorman commented 3 weeks ago

Realized one thing: the function might not work if the input is SummarizedExperiment since it has no trees.

--> Create generic method for TreeSummarizedExperiment

  1. Call next method which will can generic SE method
  2. Merge trees
  3. Merge reference sequences (see agglomerateByVariable)
TuomasBorman commented 1 week ago

Agglomeration of reference sequences was not working, but now everything should be fine.

Can you add tests for reference sequences? Check other agglomeration functions and their unit tests.

You can find suitable data from data("SilvermanAGutData", package = "miaTime")