joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:

http://joey711.github.io/phyloseq/

581 stars 187 forks source link

phyloseq object: Determine relative abundance of one taxon in each sample #1518

Open pauGuas opened 2 years ago

pauGuas commented 2 years ago

Hi, I am trying to do multiple regression to determine what impacts how much cyanobacteria is in each sample. How can I isolate relative abundance of cyanobacteria for this analysis? Thank you!

ycl6 commented 2 years ago

Hi @pauGuas You can first combine the taxa counts at phylum-level, calculate relative abundance, then retrieve only the Cyanobacteria values.

library(phyloseq)
library(ggplot2)
data(GlobalPatterns)

GP <- GlobalPatterns

# Agglomerate taxa at phylum level
GP.phylum <- tax_glom(GP, taxrank = "Phylum")

# Calculate relative abundance
GP.phylum.prop <- transform_sample_counts(GP.phylum, function(x) x / sum(x) )

# Subset object to only Cyanobacteria
GP.phylum.prop.cyano <- subset_taxa(GP.phylum.prop, Phylum == "Cyanobacteria")

> GP.phylum.prop.cyano
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1 taxa and 26 samples ]
sample_data() Sample Data:       [ 26 samples by 7 sample variables ]
tax_table()   Taxonomy Table:    [ 1 taxa by 7 taxonomic ranks ]

> otu_table(GP.phylum.prop.cyano)
OTU Table:          [1 taxa and 26 samples]
                     taxa are rows
               CL3         CC1       SV1      M31Fcsw   M11Fcsw    M31Plmr    M11Plmr    F21Plmr     M31Tong
549656 0.002262548 0.002953131 0.0239102 0.0002740612 0.1024871 0.02174028 0.07532272 0.04229268 0.003366823
         M11Tong  LMEpi24M   SLEpi20M    AQC1cm    AQC4cm    AQC7cm       NP2        NP3       NP5    TRRsed1
549656 0.1287405 0.5262976 0.07716516 0.6467839 0.6746747 0.5557081 0.2390849 0.05762146 0.2786321 0.04637478
         TRRsed2     TRRsed3         TS28        TS29       Even1        Even2        Even3
549656 0.0480137 0.002717488 0.0005290859 0.001760425 0.003331861 0.0006559754 0.0009598967

pauGuas commented 2 years ago

@ycl6 sorry for my late response. Thank you so much for this code! It worked.

najeeha-348 commented 2 years ago

hello i wanted to calculate relative abundance for ASVs, but my samples have spike in bacteria as well. what is the best way of calculating relative and absolute abundance ?

any formula explaining difference between absolute vs. relative abundance

ycl6 commented 2 years ago

@najeeha-348 This isn't really a phyloseq issue but methodology relating to your experimental design. You can use the spike-in bacteria to adjust the read counts before performing any downstream analysis.

Below is an article from Microbiome in 2016 that might help, you can search for more relevant publications https://doi.org/10.1186/s40168-016-0175-0 https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-016-0175-0/figures/6

pauGuas commented 2 years ago

@ycl6 , could you help me with the error I am getting? It is "length of 'dimnames' [1] not equal to array extent".

Genus <- tax_glom(ps, taxrank = "Genus") Genus.prop <- transform_sample_counts(Genus, function(x) x / sum(x) ) Genus.prop.Flavobacterium <- subset_taxa(Genus.prop, Genus == "Flavobacterium ") otu.Flavobacterium <- otu_table(Genus.prop.Flavobacterium)

head(tax_table(Genus.prop)) Taxonomy Table: [6 taxa by 6 taxonomic ranks]: Kingdom Phylum Class Order Family Genus
ASV_1 "Bacteria" "Bacteroidota" "Bacteroidia" "Flavobacteriales" "Flavobacteriaceae" "Flavobacterium" ASV_2 "Bacteria" "Bacteroidota" "Bacteroidia" "Cytophagales" "Spirosomaceae" "Pseudarcicella" ASV_4 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Burkholderiales" "Comamonadaceae" "Limnohabitans" ASV_8 "Bacteria" "Actinobacteriota" "Actinobacteria" "Micrococcales" "Microbacteriaceae" "Rhodoluna"
ASV_9 "Bacteria" "Firmicutes" "Bacilli" "Staphylococcales" "Staphylococcaceae" "Staphylococcus" ASV_13 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Burkholderiales" "Comamonadaceae" "Simplicispira"

ycl6 commented 2 years ago

Hi @pauGuas

I noticed in your subset_taxa(), you have a space right after your genus Flavobacterium. That'll make R unable to find the genus you wanted. For example

library(phyloseq)
data(GlobalPatterns)

GP <- GlobalPatterns

Genus <- tax_glom(GP, taxrank = "Genus")
Genus.prop <- transform_sample_counts(Genus, function(x) x / sum(x) )

If subset_taxa can't fine the string pattern in the dataset, "Sulfolobus " <- notice the unintended space:

> Genus.prop.subset <- subset_taxa(Genus.prop, Genus == "Sulfolobus ")
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

This will give you warning message because you end up with just 1 merged ASV after subset:

> Genus.prop.subset <- subset_taxa(Genus.prop, Genus == "Sulfolobus")
Warning message:
In prune_taxa(taxa, phy_tree(x)) :
  prune_taxa attempted to reduce tree to 1 or fewer tips.
 tree replaced with NULL.

> Genus.prop.subset
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1 taxa and 26 samples ]
sample_data() Sample Data:       [ 26 samples by 7 sample variables ]
tax_table()   Taxonomy Table:    [ 1 taxa by 7 taxonomic ranks ]

pauGuas commented 2 years ago

@ycl6 I am trying to edit this code so I can determine the relative abundance of one taxon in each season that the samples were taken. I feel like this should be simple but I am just not figuring it out!

pauGuas commented 2 years ago

Here is what I came up with:

physeq_sample_merge <- merge_samples(ps, "Season") ps <- physeq_sample_merge

Agglomerate taxa at Genus level

Genus <- tax_glom(ps, taxrank = "Genus")

Calculate relative abundance

Genus.prop <- transform_sample_counts(Genus, function(x) x / sum(x))

Subset object to only Simplicispira

Genus.prop.Simplicispira <- subset_taxa(Genus.prop, Genus == "Simplicispira") otu.Simplicispira <- otu_table(Genus.prop.Simplicispira) write.csv(otu.Simplicispira, "otu.Simplicispira.season.csv")

I then did this for each taxon I'm interested. Worked!!

ycl6 commented 2 years ago

Hi @pauGuas,

Someone has asked a similar question and your solution is pretty much the same as what I had in that post, except I used %>% to run tax_glom, transform_sample_counts and psmelt in one line. https://github.com/joey711/phyloseq/issues/1521#issuecomment-1002557751