mikemc / speedyseq

Speedy versions of phyloseq functions
https://mikemc.github.io/speedyseq/
Other
46 stars 6 forks source link

tax_glom produces different results from phyloseq's tax_glom #4

Closed mikemc closed 5 years ago

mikemc commented 5 years ago

End up with different numbers of taxa when glomming to Genus; seems to be something to do with how NAs are being handled.

library(phyloseq)
library(speedyseq)
data(GlobalPatterns)

ps1 <- phyloseq::tax_glom(GlobalPatterns, "Genus") # slow
ps2 <- speedyseq::tax_glom(GlobalPatterns, "Genus") # fast
ntaxa(ps1)
ntaxa(ps2)

library(tidyverse)
ps <- GlobalPatterns
tb <- tax_table(ps) %>% as("matrix") %>% as_tibble(rownames = "OTU")
tb %>%
    select(-OTU, -Species) %>%
    filter(!is.na(Genus)) %>%
    distinct