joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Standard deviation of taxa in phyloseq object? #1085

Open Stabilo12 opened 5 years ago

Stabilo12 commented 5 years ago

Hi,

I'm quite new to R and Phyloseq and have a question, which is probably really easy, but I have no clue how to calculate a standard deviation of the taxa in a phyloseq object And I couldn't find the answer online..

I'm albe to calculate the abundance of the taxa (on family level) per intervention group:

ps.taxa <- tax_glom(ps, taxrank = "Family") ps.taxa <-merge_samples(ps.taxa, "Group") ps.taxa <- transform_sample_counts(ps.taxa , function(x) x/sum(x)) ps.dat <- data.frame(otu_table(ps.taxa)) names(ps.dat) <- ((ps.taxa@tax_table@.Data) [, "Family"])

How can I get the SD? Thanks in advanced!

mikemc commented 5 years ago

Do you want the standard deviation of the proportion for each taxon across samples within each group? Using the dplyr package you could do

library(dplyr)

# Assume `ps` is a phyloseq object that has been glommed to the Family level
# and converted to proportions.

tb <- psmelt(ps) %>%
  as_tibble
tb0 <- tb %>%
    group_by(OTU, Intervention_group) %>%
    summarize(Mean = mean(Abundance), SD = sd(Abundance)) %>%
    ungroup

to get a dataframe tb0 with the mean and standard deviation of the proportion of each OTU / Family across samples within each Intervention_group. Note, you should be careful about renaming the OTUs by their Family ignoring the higher level classifications, since the Family name might not be unique. You need to check first:

tax_table(ps)[,"Family"] %>% anyDuplicated

For instance, there are multiple cases in the GlobalPatterns example dataset where the "Family" name is not enough to specify a family.

data(GlobalPatterns)
ps <- tax_glom(GlobalPatterns, "Family")
tax <- tax_table(ps)

tax[duplicated(tax[,"Family"]),"Family"] %>% unique
#> Taxonomy Table:     [6 taxa by 1 taxonomic ranks]:
#>        Family               
#> 96014  "Alteromonadaceae"   
#> 117364 "Piscirickettsiaceae"
#> 530809 "Chromatiaceae"      
#> 7609   "Thiotrichaceae"     
#> 182827 "Sinobacteraceae"    
#> 534609 "Rhodobacteraceae"   

tax[which(tax[,"Family"] == "Piscirickettsiaceae"),]
#> Taxonomy Table:     [3 taxa by 7 taxonomic ranks]:
#>        Kingdom    Phylum           Class                
#> 235822 "Bacteria" "Proteobacteria" "Gammaproteobacteria"
#> 117364 "Bacteria" "Proteobacteria" "Gammaproteobacteria"
#> 574649 "Bacteria" "Proteobacteria" "Gammaproteobacteria"
#>        Order               Family                Genus Species
#> 235822 "Oceanospirillales" "Piscirickettsiaceae" NA    NA     
#> 117364 "Thiotrichales"     "Piscirickettsiaceae" NA    NA     
#> 574649 "Methylococcales"   "Piscirickettsiaceae" NA    NA     
Stabilo12 commented 5 years ago

This is great! Thanks a lot!

AnnaAntonatouPap commented 3 years ago

Hello I would like to ask how I can include the total abundance in this table Thanks a lot