joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

merge_taxa does not behave properly #1663

Open ETaSky opened 1 year ago

ETaSky commented 1 year ago

Hello! I am reporting a possible bug with the function merge_taxa, my understanding of this function is that it would sum the abundance of several taxa and use the taxa info of a selected taxa (first one by default) to represent those taxa after they were merged into one. However, for some reason, the taxa annotation after merging did not behave as I expected. Following is a simple example:

data(GlobalPatterns)
phylo.test = prune_taxa(taxa_names(GlobalPatterns)[c(902, 566, 1223, 748, 542, 299, 866, 777)], GlobalPatterns) # just to create a simple example
#tax_table(phylo.test) %>% as.matrix() %>% as.data.frame() %>% View
test <- merge_taxa(phylo.test, taxa_names(phylo.test)[2:4], archetype = taxa_names(phylo.test)[4])
#tax_table(phylo.test) %>% as.matrix() %>% as.data.frame() %>% View

From the object test, one can see that the 3 taxa have probably been successfully merged, but instead of using the taxa info of that selected taxa to represent the group, it appears the result is the most recent common ancestor of the three taxa. I feel that this is not the expected behavior, is it?

Thanks!

ycl6 commented 1 year ago

Hi @ETaSky It seems to work as intended, where archetype specifies the ASV/taxon name used to represent the summed group.

archetype: (Optional). A single-length numeric or character. The index
          of ‘eqtaxa’, or OTU ID, indicating the species that should be
          kept to represent the summed/merged group of
          species/taxa/OTUs.  The default is to use the OTU with the
          largest count total if counts are available, or to use ‘1’
          (the first OTU in ‘eqtaxa’) otherwise. If ‘archetype’ is not
          a valid index or index-name in ‘eqtaxa’, the first will be
          used, and the value in archetype will be used as the
          index-name for the new species.

In your example, taxa_names(phylo.test)[4] is 109779. If we changed archetype = to the 2nd or 3rd ASV/taxon, the name used to represent the summed group changed to 260879 and 256950 respectively.

library(phyloseq)
data(GlobalPatterns)

# just to create a simple example
phylo.test <- prune_taxa(taxa_names(GlobalPatterns)[c(902, 566, 1223, 748, 542, 299, 866, 777)], GlobalPatterns)
cbind(taxa_names(phylo.test))
#>      [,1]    
#> [1,] "208818"
#> [2,] "260879"
#> [3,] "256950"
#> [4,] "109779"
#> [5,] "181645"
#> [6,] "257001"
#> [7,] "313857"
#> [8,] "12357"

test1 <- merge_taxa(phylo.test, taxa_names(phylo.test)[2:4], archetype = taxa_names(phylo.test)[2])
cbind(taxa_names(test1))
#>      [,1]    
#> [1,] "208818"
#> [2,] "260879"
#> [3,] "181645"
#> [4,] "257001"
#> [5,] "313857"
#> [6,] "12357"

test2 <- merge_taxa(phylo.test, taxa_names(phylo.test)[2:4], archetype = taxa_names(phylo.test)[3])
cbind(taxa_names(test2))
#>      [,1]    
#> [1,] "208818"
#> [2,] "256950"
#> [3,] "181645"
#> [4,] "257001"
#> [5,] "313857"
#> [6,] "12357"

test3 <- merge_taxa(phylo.test, taxa_names(phylo.test)[2:4], archetype = taxa_names(phylo.test)[4])
cbind(taxa_names(test3))
#>      [,1]    
#> [1,] "208818"
#> [2,] "109779"
#> [3,] "181645"
#> [4,] "257001"
#> [5,] "313857"
#> [6,] "12357"

Created on 2023-02-28 by the reprex package (v2.0.1)