why I might be getting top 15 genus different from RA above certain threshold?

marwa38 commented 1 year ago

Shouldn't the top taxa (genus level here) should be the same output for the RA stacked bar chart of relative abudances (RA) above certain threshold?

I used the same inputs for either of them and I didn't get exactly the same results

RA

Top 20

here the codes for each do you think that is to say that the sorting of taxa sums is not based on the relative abundances?

Relative abundances codes

ps.prev.rel = transform_sample_counts(ps.prev, function(x) x/sum(x)*100)

# SELECTING those which have NA to replace them 
na.genus <- is.na(tax_table(ps.prev.rel)[,"Genus"])
# Replace them with full name consisting of Family and Genus
tax_table(ps.prev.rel)[na.genus][,"Genus"] <- paste(tax_table(ps.prev.rel)[na.genus][,"Family"], 
                                                                  tax_table(ps.prev.rel)[na.genus][,"Genus"])

# agglomerate taxa
np.glom <- tax_glom(ps.prev.rel, taxrank = 'Genus', NArm = FALSE)
ps.prev.melt <- psmelt(np.glom)

# Genus
ps.prev.melt$Genus <- as.character(ps.prev.melt$Genus)

# change to character for easy-adjusted level
# Genus
ps.prev.melt$Genus <- as.character(ps.prev.melt$Genus)

ps.prev.melt <- ps.prev.melt %>%
  group_by(Regime, sample, Genus) %>%
  mutate(mean=mean(Abundance))

# select group mean > 3
keep <- unique(ps.prev.melt$Genus[ps.prev.melt$mean > 3])
ps.prev.melt$Genus[!(ps.prev.melt$Genus %in% keep)] <- "< 3%"
#to get the same rows together
ps.prev.melt_sum <- ps.prev.melt %>%
  group_by(sample, region_regime, Genus) %>%
  summarise(Abundance=sum(Abundance))
# `summarise()` has grouped output by 'sample_Sample', 'Regime'. You can override using the `.groups` argument.

TOP 20 codes

# Genus assigned to Family level ----
# Top fam.gen
ps.prev2 <- ps.prev.tree.ra
# SELECTING those which have NA to replace them 
na.genus <- is.na(tax_table(ps.prev2)[,"Genus"])
# Replace them with full name consisting of Family and Genus
tax_table(ps.prev2)[na.genus][,"Genus"] <- paste(tax_table(ps.prev2)[na.genus][,"Family"], 
                                                                  tax_table(ps.prev2)[na.genus][,"Genus"])

# Agglomerate Taxa at fam:gen level 
ps.prev2.genus <- tax_glom(ps.prev2, taxrank = "Genus", NArm = FALSE)

# select top 
top20GenAssFam <- names(sort(taxa_sums(ps.prev2.genus), decreasing = TRUE))[1:20]

## Prune
ps.prev2.top20GenAssFam.ra <- prune_taxa(top20GenAssFam, ps.prev2.genus)

# plot 
# plot
plot_tree(ps.prev2.top20GenAssFam.ra, nodelabf=nodeplotboot(), label.tips = "Genus", color = "region_regime", shape = "Phylum",
          title = "Phylogenetic tree of Top 20 Genus assigned to Family", ladderize = "left")

ggsave("figures/ps.prev2.top20GenAssFam.ra.tiff", height = 7, width = 15)

# I also used this code and got same outputs
taxa_tab <- Summarize.Taxa(otu, txnm)$Genus %>% Make.Percent() 
taxa_tab <- taxa_tab[order(rowMeans(taxa_tab), decreasing = T), ]
Others <- colSums(taxa_tab[20:nrow(taxa_tab), ])
taxa_tab <- rbind(taxa_tab[1:21, ], Others)

any thoughts please @ycl6

ycl6 commented 1 year ago

@marwa38 I don't think they'll output the same taxa as one uses with raw counts, whereas the other you've normalised the raw counts against total count, i.e. relative abundance. Hence the Top N taxa (or top N% taxa) from the two objects will be different.

marwa38 commented 1 year ago

Thanks so much for your comment. @ycl6 Sorry for not clarifying that ps.prev.ra is relative abundances so also top taxa codes I used have RA input (ps.prev.ra). They both reproduced common taxa but they have some differences but I am not sure why? Thoughts?

joey711 / phyloseq

why I might be getting top 15 genus different from RA above certain threshold? #1661