Open shrutimicrobiota opened 5 years ago
I too would like help with this. I have read through issues #418 and #1005 but haven't been able to find what I am looking for. I can get the code in #418 to work and get this result:
summarize_taxa(filtered_data1, "Phylum")
Phylum meanRA sdRA minRA maxRA
1: Firmicutes 1.136554e-03 1.757837e-02 5.226026e-06 9.748357e-01
2: Actinobacteria 1.518721e-03 1.992568e-02 5.226026e-06 8.308060e-01
3: Proteobacteria 1.195875e-03 2.002892e-02 5.226026e-06 7.089428e-01
4: Bacteroidetes 4.965731e-04 5.476752e-03 5.226026e-06 1.875852e-01
5: Fusobacteria 2.998574e-04 1.354886e-03 5.226026e-06 1.648079e-02
6: Acidobacteria 2.976221e-05 2.314770e-05 7.467312e-06 6.770538e-05
7: Chloroflexi 3.831400e-05 4.994198e-05 1.028066e-05 1.391487e-04
8: TM7 1.537751e-04 7.062572e-04 5.226026e-06 8.478775e-03
9: [Thermi] 2.054587e-04 5.285998e-04 8.233705e-06 1.876049e-03
10: SR1 2.539156e-05 2.293604e-05 8.735074e-06 8.319468e-05
11: Spirochaetes 3.354825e-05 3.055969e-05 7.767533e-06 1.215260e-04
12: Synergistetes 2.068694e-05 1.425025e-05 9.025760e-06 4.418653e-05
But what I am looking for is to know what the average % abundance is of all samples together -- the number that would contribute to a bar graph such as this: Where it appears, for example, that Actinobacteria makes up about 45% in the female samples.
Thanks!
@ehickman0817 See my suggestion here; that seems to be what you are after.
Another way to do it is, if you're familiar with dplyr, is
library(dplyr)
tb <- psmelt(physeq) %>%
filter(!is.na(Phylum)) %>%
group_by(Sex, Sample, Phylum) %>%
# Add abundances within each phylum and sample
summarize_at("Abundance", sum) %>%
# Get phylum proportions within each sample
summarize_at("Proportion", ~ . / sum(.)) %>%
# Get the mean proportion each phylum across samples within the `Sex` (Male/Female) variable
summarize_at("Proportion", mean)
I think you would get the same proportions, though you should check.
Thanks you so much! That's actually the post I used to help generate my graphs but I didn't think to use it for this. In case anyone else needs this, here is what I did:
ps3 <- tax_glom(filtered_data1, "Phylum")
ps3.top <- merge_less_than_top(ps3, top=4)
ps4 <- merge_samples(ps3.top, "Sex")
summarize_taxa(ps4)
Phylum meanRA sdRA minRA maxRA
1: Actinobacteria 0.497401373 0.0924000714 0.432064656 0.562738090
2: Proteobacteria 0.118322140 0.0159394939 0.107051216 0.129593065
3: Firmicutes 0.367329848 0.0872312474 0.305648042 0.429011655
4: Bacteroidetes 0.013948907 0.0103055865 0.006661757 0.021236057
5: Other 0.002997731 0.0004650833 0.002668868 0.003326595
My only remaining question is when I merge samples by sex, then what is the summarize mean indicating? Mean in males or females or both overall? When I perform the analysis only on ps3.top, I get this, which is slightly different:
summarize_taxa(ps3.top, "Phylum")
Phylum meanRA sdRA minRA maxRA
1: Actinobacteria 0.50583192 0.248091642 9.166681e-03 0.96852494
2: Proteobacteria 0.11686783 0.206015368 1.212441e-03 0.90571009
3: Firmicutes 0.35937091 0.242093337 2.936509e-02 0.98749769
4: Bacteroidetes 0.01513327 0.038105555 1.512013e-05 0.23025235
5: Other 0.00319475 0.007250804 4.506433e-05 0.03935353
Ah never mind, I figured it out. Instead of the code I wrote above, when I use this, I get what I want:
ps4 <- subset_samples(ps3.top, Sex=="Female")
ummarize_taxa(ps4, "Phylum")
Phylum meanRA sdRA minRA maxRA
1: Actinobacteria 0.432064656 0.277851329 9.166681e-03 0.93190020
2: Proteobacteria 0.129593065 0.244960747 1.583222e-03 0.90571009
3: Firmicutes 0.429011655 0.295017610 3.206690e-02 0.98749769
4: Bacteroidetes 0.006661757 0.009330800 3.091286e-05 0.03310275
5: Other 0.002771517 0.006881246 4.506433e-05 0.03174633
I actually like the result when you merge by "Sex" because the min has to be the relative abundance of one treatment and the max has to be the other treatment. Anyone know a way to differentiate the two when running summarize taxa?
Hi folks,
I was just working on this code to get a table with % abundance per group and I realized that it calculates the % based on all samples. So, I changed it to get the % based on the abundance in each group. Please, have a look and let me know if the code works?? Also, I included a arrange
in the function if you want to sort the table by Abundance.
Thanks
summarize_taxa = function(physeq, Rank, GroupBy = NULL, arrange = FALSE){
Rank <- Rank[1]
if(!Rank %in% rank_names(physeq)){
message("The argument to `Rank` was:\n", Rank,
"\nBut it was not found among taxonomic ranks:\n",
paste0(rank_names(physeq), collapse = ", "), "\n",
"Please check the list shown above and try again.")
}
if(!is.null(GroupBy)){
GroupBy <- GroupBy[1]
if(!GroupBy %in% sample_variables(physeq)){
message("The argument to `GroupBy` was:\n", GroupBy,
"\nBut it was not found among sample variables:\n",
paste0(sample_variables(physeq), collapse = ", "), "\n",
"Please check the list shown above and try again.")
}
}
# Start with psmelt
ps.melt <- physeq %>%
tax_glom(taxrank = Rank) %>% # agglomerate at 'Rank' level
psmelt() %>% # Melt to long format
arrange(Rank) # arrange by 'Rank'
if(!is.null(GroupBy)){
columns <- c(GroupBy,Rank)
# Add the variable indicated in `GroupBy`, if provided.
summ.df <- ps.melt %>%
# group by Rank and Group
group_by_at(vars(one_of(columns))) %>%
# summarise and create a column with the relative abundance
summarise(Abundance = sum(Abundance)) %>%
mutate(freq = paste0(round(100 * Abundance / sum(Abundance), 2), "%"))
}else{# only with 'Rank'
summ.df <- ps.melt %>%
group_by_at(Rank) %>%
summarise(Abundance = sum(Abundance)) %>%
mutate(freq = paste0(round(100 * Abundance / sum(Abundance), 2), '%'))
}
if(arrange == FALSE){
return(summ.df)
}else{# sort data with the most abundant in the first line
summ.df.ar <- summ.df %>% arrange(desc(Abundance))
return(summ.df.ar)
}
}
Another way to do it is, if you're familiar with dplyr, is
library(dplyr) tb <- psmelt(physeq) %>% filter(!is.na(Phylum)) %>% group_by(Sex, Sample, Phylum) %>% # Add abundances within each phylum and sample summarize_at("Abundance", sum) %>% # Get phylum proportions within each sample summarize_at("Proportion", ~ . / sum(.)) %>% # Get the mean proportion each phylum across samples within the `Sex` (Male/Female) variable summarize_at("Proportion", mean)
I think you would get the same proportions, though you should check.
Found the thread for my project I'm working on right now. This doesn't work for me at all. After the second " summarize_at" the operation fails. The other solutions wont help me neither.
I have created my relative abundance plot. however, i feel that it is too messy and would like to speperate them by the variables (ie. "CF70", "NUS" etc). i subset the data and created the plot again but i keep getting all the genus in the legend instead of just the ones present in that particular catagory. i tried looking for an answer online to rectify it but couldnt really find an answer to my problem. could someone help me? i have pasted my codes below. in addition, i would also like to combine all the "unclassified" genus" into one just labelled "unclassified". thank you so much in advance!
ps.sub.D0 <- subset_samples(ps, diet %in% c("D0"))
ps.rel.D0 = transform_sample_counts(ps.sub.D0, function(x) x/sum(x)*100)
glom <- tax_glom(ps.rel.D0, taxrank = 'Genus', NArm = FALSE) ps.melt <- psmelt(glom)
ps.melt$Genus <- as.character(ps.melt$Genus) ps.melt <- ps.melt %>% group_by(expid,diet, Genus) %>% mutate(median=median(Abundance))
ps.melt_sum <- ps.melt %>% group_by(expid,diet, Genus) %>% summarise(Abundance=mean(Abundance))
ggplot(ps.melt_sum, aes(x=expid, y = Abundance, fill = Genus)) + geom_bar(stat = "identity", aes(fill=Genus)) + labs(x="", y="Relative Abundance %") + facet_wrap(~diet, scales= "free_x", nrow=1) + theme_classic() + theme(legend.position = "right", legend.text = element_text(size = 7), strip.background = element_blank(), axis.text.x=element_blank())
Hi guys, I am new to writing R scripts, therefore need a little help. I have a barplot showing the relative abundance of the topmost dominant bacterial phyla. I would like to know the average percentage of each phylum in my dataset. How could do that? Any help will be appreciated! Thanks :-)