How to rank by Relative Abundance? Why are values greater than 1?

I have the following phyloseq object:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 546 taxa and 46 samples ]
sample_data() Sample Data:       [ 46 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 546 taxa by 7 taxonomic ranks ]
refseq()      DNAStringSet:      [ 546 reference sequences ]

Which I further subsetted into asthma and control group samples.

I have been asked to do the following: "In the Asthma group, identify the top 10 most abundant fungi genera, list their names and relative abundance."

So I run the following code for the asthma only subset phyloseq object:

psga = tax_glom(asthma, taxrank = "Genus")
ps.tA<- transform_sample_counts(psga, function(x) x/sum(x))
top10A_counts<- sort(taxa_sums(ps.tA), decreasing = TRUE)[1:10]
top10_A <- names(sort(taxa_sums(ps.tA), decreasing = TRUE))[1:10]
taxA <- tax_table(ps.tA)[top10_A,]

What is taxa_sums() doing in line 3? I am trying to access the relative abundance for each genus across the asthma samples. In other words, I want (genus read/ all reads of that sample). I have my transformed_sample_counts to relative abundance in the otu table under ps.tA. Then I sort the taxa_sums() and get the following: ASV1 ASV2 ASV8 ASV5 ASV9 ASV24 ASV47 ASV30 ASV7 ASV36 6.8478176 2.1261399 1.4221070 1.0237118 0.9254110 0.8490590 0.7694261 0.7166408 0.6198016 0.5363423

Is that the sum of the abundance of that specific ASV(genus) across all samples. And therefore it is my top 10?

So now, how do I divide the abundance of that ASV by the total of all ASVs? Would that be how to answer what I was asked to do?

Hi @sarakn97 Your code should work as intended. I used GlobalPatterns to demonstrate it (in this case on the Soil subset).

library(phyloseq)
library(tidyverse)
data(GlobalPatterns)

sub = subset_samples(GlobalPatterns, SampleType == "Soil")
sub = prune_taxa(taxa_sums(sub) > 0, sub)

psga = tax_glom(sub, taxrank = "Genus")
ps.tA =  transform_sample_counts(psga, function(x) x/sum(x))
top10A_counts = sort(taxa_sums(ps.tA), decreasing = TRUE)[1:10]

> top10A_counts
    256977      36155     566886     158370      71074     166835     262714     251499     210176 
0.72228052 0.46000673 0.11545697 0.11503159 0.09888366 0.09633094 0.08012182 0.05797974 0.05611374 
    565556 
0.04927351

Basically, it is the same as using rowSums to calculate the total relative abundance in each row (taxon) of the OTU table.

> rowSums(otu_table(ps.tA)[names(top10A_counts),])
    256977      36155     566886     158370      71074     166835     262714     251499     210176 
0.72228052 0.46000673 0.11545697 0.11503159 0.09888366 0.09633094 0.08012182 0.05797974 0.05611374 
    565556 
0.04927351

Where as the colSums of the OTU table of ps.tA should be 1 in all the samples.

> colSums(otu_table(ps.tA))
CL3 CC1 SV1 
  1   1   1

We can also use colSums to sum up the relative abundance of the top10 taxa in each sample. These values should be less than 1, as oppose to rowSums where if you have a lot of samples (e.g. 50 samples), even with a ~10% relative abundance of a given taxon in most of the samples, adding that up can be way over 1.

> colSums(otu_table(ps.tA)[names(top10A_counts),])
      CL3       CC1       SV1 
0.6734547 0.6969163 0.4811083

Lastly, we can look at the relative abundance of the top10 taxa in your OTU table like below to check if the numbers are correct.

> otu_table(ps.tA)[names(top10A_counts),]
OTU Table:          [10 taxa and 3 samples]
                     taxa are rows
                CL3         CC1          SV1
256977 0.1422748983 0.512557879 0.0674477401
36155  0.3671454610 0.051761398 0.0410998707
566886 0.0545918619 0.051393370 0.0094717401
158370 0.0116050223 0.023915388 0.0795111834
71074  0.0006802543 0.002103322 0.0961000878
166835 0.0765896584 0.019263688 0.0004775948
262714 0.0050176023 0.013454412 0.0616498061
251499 0.0141923143 0.017275910 0.0265115207
210176 0.0012413188 0.004711612 0.0501608125
565556 0.0001162828 0.000479292 0.0486779308

joey711 / phyloseq

How to rank by Relative Abundance? Why are values greater than 1? #1616