joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

How to Identify Top 10 Most Abundant Fungi Genera and list them with Relative Abundance. #1614

Open sarakn97 opened 1 year ago

sarakn97 commented 1 year ago

Hello, I need to identify the top 10 most abundant Genus for the Asthma Samples and the Control Samples that I have.

I have the following phyloseq object:

phyloseq-class experiment-level object otu_table() OTU Table: [ 546 taxa and 37 samples ] sample_data() Sample Data: [ 37 samples by 12 sample variables ] tax_table() Taxonomy Table: [ 546 taxa by 7 taxonomic ranks ] refseq() DNAStringSet: [ 546 reference sequences ]

14 of the Samples have Asthma and 23 Do not. I subset the phyloseq object into two groups: those with asthma and control. I used the following command to get the top taxa for the Asthma group and Control:

Get most abundant taxa for asthma group

top10_A <- names(sort(taxa_sums(asthma), decreasing = TRUE)[1:10])

get most abundant for control

top10_C <- names(sort(taxa_sums(no_asthma), decreasing = TRUE)[1:10])

tax_table(asthma)[top10_A,] tax)tables(no_asthma)[top10_C,]

So the above returned the top 10 most abundant ASVs and the respective taxonomy of those ASVs. However, I was asked to find the most abundant Genera, how do I perform this only based on Genus and the respective relative abundance at the genus level? For example, in the results below, ASV1 and ASV38 have the same Genus..

Results::::

ASV1  "k__Fungi" "p__Ascomycota"    "c__Saccharomycetes"   "o__Saccharomycetales" "f__Saccharomycetales_fam_Incertae_sedis"
ASV2  "k__Fungi" "p__Ascomycota"    "c__Saccharomycetes"   "o__Saccharomycetales" "f__Saccharomycetaceae"                  
ASV5  "k__Fungi" "p__Ascomycota"    "c__Dothideomycetes"   "o__Pleosporales"      "f__Pleosporaceae"                       
ASV7  "k__Fungi" "p__Ascomycota"    "c__Dothideomycetes"   "o__Pleosporales"      "f__Phaeosphaeriaceae"                   
ASV8  "k__Fungi" "p__Ascomycota"    "c__Saccharomycetes"   "o__Saccharomycetales" "f__Phaffomycetaceae"                    
ASV9  "k__Fungi" "p__Ascomycota"    "c__Dothideomycetes"   "o__Capnodiales"       "f__Cladosporiaceae"                     
ASV14 "k__Fungi" "p__Ascomycota"    "c__Saccharomycetes"   "o__Saccharomycetales" "f__Phaffomycetaceae"                    
ASV16 "k__Fungi" "p__Basidiomycota" "c__Malasseziomycetes" "o__Malasseziales"     "f__Malasseziaceae"                      
ASV17 "k__Fungi" "p__Basidiomycota" "c__Agaricomycetes"    NA                     NA                                       
ASV38 "k__Fungi" "p__Ascomycota"    "c__Saccharomycetes"   "o__Saccharomycetales" "f__Saccharomycetales_fam_Incertae_sedis"
      Genus              Species          
ASV1  "g__Candida"       "s__albicans"    
ASV2  "g__Saccharomyces" "s__kudriavzevii"
ASV5  "g__Alternaria"    NA               
ASV7  "g__Phaeosphaeria" "s__oryzae"      
ASV8  "g__Cyberlindnera" "s__jadinii"     
ASV9  "g__Cladosporium"  "s__tenuissimum" 
ASV14 "g__Cyberlindnera" "s__jadinii"     
ASV16 NA                 NA               
ASV17 NA                 NA               
ASV38 "g__Candida"       "s__albicans"    

Thank You.

sarakn97 commented 1 year ago

I believe that I figured it out with tax_glom?

# Merges ASVs that have the same taxonomy rank (Genus)
psga = tax_glom(asthma, taxrank = "Genus")

# Calculate taxa sum of the selected samples
ps.tA<- transform_sample_counts(psga, function(OTU) OTU/sum(OTU))

top10A_counts<- sort(taxa_sums(ps.tA), decreasing = TRUE)[1:10]
top10_A <- names(sort(taxa_sums(ps.tA), decreasing = TRUE))[1:10]
taxA <- tax_table(asthma)[top10_A,]
gmteunisse commented 1 year ago

Yes, tax_glom is the right function. However, you should set NArm = F to avoid discarding ASVs without a genus annotation - if not, your relative abundances will not be correct.