Closed emankhalaf closed 5 years ago
Hi,
I suggest you take a look at the preDA function in the DAtest package to group the low adundant genera. After that you should be able to either use plot_bar() from phyloseq or melt your data and use ggplot2 directly.
Regards, Martin
I checked the package, it is mainly for differential abundance estimation. But, my question is simply how to generate 100% stacked bars of relative abundance % at a specific taxonomic level. My trial resulted in stacked bars but not 100%, so I did it manually on excel but still not the same quality of figures as in published articles.
This kind of errors are generally caused by small annoying parsing errors somewhere in the code, so if you can share your code and data I can take a look. A link to the graphs you want the output to look like would also be a help.
I used two different methods with different outputs. The first one using the following code:
silk_workingr <- transform_sample_counts(silk_working, function(x) x/sum(x))
# agglomerate taxa
glom <- tax_glom(silk_workingr, taxrank = 'Genus')
dat <- psmelt(glom)
dat$Genus <- as.character(dat$Genus)
library(plyr)
medians <- ddply(dat, ~Genus, function(x) c(median=median(x$Abundance)))
remainder <- medians[medians$median <= 0.01,]$Genus
dat[dat$Genus %in% remainder,]$Genus <- 'Remainder'
ggplot(dat,
aes(x=Genus,
y=Abundance)) + geom_boxplot() + coord_flip()
library(RColorBrewer)
colourCount = length(unique(dat$Genus))
getPalette = colorRampPalette(brewer.pal(9, "Set2"))
p <- ggplot(data=dat, aes(x=FieldBlockReplicate, y=Abundance, fill=Genus))
p + geom_bar(aes(), stat="identity", position="stack") + scale_fill_manual(values=getPalette(colourCount)) + theme(legend.position="right") + guides(fill=guide_legend(nrow=8))
The second method as follow:
silk_genus = tax_glom(silk_working, "Genus")
tax_table(silk_genus)
silk_gen_ab = transform_sample_counts(silk_genus, function(x) x/sum(x))
silk_gen_ab
taxa_sums(silk_gen_ab)/42
sample_sums(silk_gen_ab)
top20 = names(sort(taxa_sums(silk_genus), TRUE)[1:20])
silk_genus_top20 = prune_taxa(top20, silk_genus)
I selected top20 because I know my microbiome is dominated by few taxa and the rest are rare / very low relative abundance.
Then, I moved to excel picking up genera with relative abundance > 1% and create an extra column for others where the taxa rel abundance <1% were grouped
The wired thing is that the first method resulted in 5 genera and the second method resulted in 11 taxa??? I would be thankful if you shared your e.mail with me, so I could send you the files. Mine is eimanpharmacist@gmail.com Thank you very much Eman
Hi,
You get diffent numbers of genera that is equal to or higher than 1% because the first method relies on median and the second uses mean.
I suggest the functions below, one uses a percentage threshold and the other a number of taxa, to do the filtering (directly on the phyloseq object):
merge_low_abundance <- function(pobject, threshold=0.001){ transformed <- transform_sample_counts(pobject, function(x) x/sum(x)) otu.table <- as.data.frame(otu_table(transformed)) otu.list <- row.names(otu.table[rowMeans(otu.table) < threshold,]) merged <- merge_taxa(transformed, otu.list, 1) for (i in 1:dim(tax_table(merged))[1]){ if (is.na(tax_table(merged)[i,2])){ taxa_names(merged)[i] <- "Other" tax_table(merged)[i,1:7] <- "Other"} } return(merged) }
merge_less_than_top <- function(pobject, top=10){ transformed <- transform_sample_counts(pobject, function(x) x/sum(x)) otu.table <- as.data.frame(otu_table(transformed)) otu.sort <- otu.table[order(rowMeans(otu.table), decreasing = TRUE),] otu.list <- row.names(otu.sort[(top+1):nrow(otu.sort),]) merged <- merge_taxa(transformed, otu.list, 1) for (i in 1:dim(tax_table(merged))[1]){ if (is.na(tax_table(merged)[i,2])){ taxa_names(merged)[i] <- "Other" tax_table(merged)[i,1:7] <- "Other"} } return(merged) }
I am having the same issue as @emankhalaf. @MSMortensen how would I use your code and implement it into the plot_bar function?
@avonjackowski The output of both the functions I have shown are a phyloseq object. Using the plot_bar function directly on that output should work without any problem.
It should look something like this:
ps.gen <- tax_glom(psdata, "Genus") ps.gen.top10 <- merge_less_than_top(ps.gen, top=10)
plot_bar(ps.gen.top.10, fill = "Genus")
@MSMortensen ok that makes a lot of sense - thank you!
However if I try to run:
merge_less_than_top <- function(all_data, top=10){
transformed <- transform_sample_counts(all_data, function(x) x/sum(x))
otu.table <- as.data.frame(otu_table(transformed))
otu.sort <- otu.table[order(rowMeans(otu.table), decreasing = TRUE),]
otu.list <- row.names(otu.sort[(top+1):nrow(otu.sort),])
merged <- merge_taxa(transformed, otu.list, 1)
for (i in 1:dim(tax_table(merged))[1]){
if (is.na(tax_table(merged)[i,2])){
taxa_names(merged)[i] <- "Other"
tax_table(merged)[i,1:7] <- "Other"}
}
return(merged)
}
ps.gen <- tax_glom(all_data, "Class")
ps.gen.top10 <- merge_less_than_top(ps.gen, top=10)
plot_bar(ps.gen.top10, fill = "Class")
I get the error: Error in [<-
(*tmp*
, i, 1:7, value = "Other") : subscript out of bounds. What am I missing now?
@avonjackowski How many taxonomical levels do you have in your tax_table? This seems to be because you do not have less than 7 levels in your tax table. You should edit the number of columns in this line: tax_table(merged)[i,1:7] <- "Other"}
@MSMortensen ok frustration makes one overlook things like that :( . I have 6 levels. I computed it and it worked really well. The one last thing I have to ask is, why some stacked bars get pulled to 1 or 100%, whereas others are not?
My code for the plot_bar function is as follows:
plot_bar(ps.gen.top10, fill = "Class", "Depth") +
geom_bar(stat="identity") +
facet_grid(~Location) +
# facet_grid(~Substrate) +
coord_flip() +
ylab("Percentage of Sequences") + ylim(0, 1) +
geom_bar(aes(fill = Class), stat = "identity", position = "stack")
THANK YOU SOOOO MUCH FOR ALL YOUR HELP!
@avonjackowski I don't know why one of the bars doesn't reach 1. I suggest that you do not use ylim and your two geom_bar arguments does not add anything as plot_bar sets those.
@MSMortensen Alright did that. The reason why some bars do not go all the way to 1, is because even though I have "Sample" names, I try and plot the "Location"by"Depth". More than one sample can be at a given "Location". Thus I would need a mean of the relative abundance per "Location". Would I have to insert another piece of code before plotting to achieve this?
@avonjackowski No that is the way it should be done. I cannot really tell you why it does not add completely.
Hi,
You get diffent numbers of genera that is equal to or higher than 1% because the first method relies on median and the second uses mean.
I suggest the functions below, one uses a percentage threshold and the other a number of taxa, to do the filtering (directly on the phyloseq object):
Merge low abundance taxa
By percentage threshold
merge_low_abundance <- function(pobject, threshold=0.001){ transformed <- transform_sample_counts(pobject, function(x) x/sum(x)) otu.table <- as.data.frame(otu_table(transformed)) otu.list <- row.names(otu.table[rowMeans(otu.table) < threshold,]) merged <- merge_taxa(transformed, otu.list, 1) for (i in 1:dim(tax_table(merged))[1]){ if (is.na(tax_table(merged)[i,2])){ taxa_names(merged)[i] <- "Other" tax_table(merged)[i,1:7] <- "Other"} } return(merged) }
Anything not with top x
merge_less_than_top <- function(pobject, top=10){ transformed <- transform_sample_counts(pobject, function(x) x/sum(x)) otu.table <- as.data.frame(otu_table(transformed)) otu.sort <- otu.table[order(rowMeans(otu.table), decreasing = TRUE),] otu.list <- row.names(otu.sort[(top+1):nrow(otu.sort),]) merged <- merge_taxa(transformed, otu.list, 1) for (i in 1:dim(tax_table(merged))[1]){ if (is.na(tax_table(merged)[i,2])){ taxa_names(merged)[i] <- "Other" tax_table(merged)[i,1:7] <- "Other"} } return(merged) }
Hi @MSMortensen
Thank you for the nice code here. I'm attempting to use it to merge any ASVs that are <1%. My output is from DAD2 if that makes a difference.
When I try to use:
ps.gen <- tax_glom(Air_snow_16S_ps_NEW, "Genus", bad_empty=c(NA, "", " ", "\t")) ps.gen.1percent <- merge_low_abundance(ps.gen)
I get the error:
Error in merge_taxa.indices.internal(x, eqtaxa, archetype) : invalid archetype provided.
Google and another Phyloseq thread led me to believe this might be due to NAs in the tax_table, however, it makes it past the tax_glom
script (as long as I use the bad_empty=c(NA, "", " ", "\t"
flag), and when I view the tax_table I'm not seeing any NAs. Any ideas what this might be about? Unfortunately, my knowledge is not strong enough to be able to problem-solve this myself.
Hi Chris, I quick guess is that the function expects 7 columns in the tax table, but as you have merged to genus level, you will only have 6. You can edit the line: tax_table(merged)[i,1:7] <- "Other" to tax_table(merged)[i,1:6] <- "Other" But, since my comments the function has been implemented in the DAtest package (https://github.com/Russel88/DAtest) in the function preDA. I think this implementation is a bit more robust and easier to work with, so my recommendation would be to use that function (it is also easier to cite that way). Best, Martin
Hi Chris, I quick guess is that the function expects 7 columns in the tax table, but as you have merged to genus level, you will only have 6. You can edit the line: tax_table(merged)[i,1:7] <- "Other" to tax_table(merged)[i,1:6] <- "Other" But, since my comments the function has been implemented in the DAtest package (https://github.com/Russel88/DAtest) in the function preDA. I think this implementation is a bit more robust and easier to work with, so my recommendation would be to use that function (it is also easier to cite that way). Best, Martin
@MSMortensen Thank you for the prompt response!
Good catch, but based on your comment above to another user I had already changed that:
merge_low_abundance <- function(Air_snow_16S_ps_NEW_clean, threshold=0.05){
transformed <- transform_sample_counts(Air_snow_16S_ps_NEW_clean, function(x) x/sum(x))
otu.table <- as.data.frame(otu_table(transformed))
otu.list <- row.names(otu.table[rowMeans(otu.table) < threshold,])
merged <- merge_taxa(transformed, otu.list, 1)
for (i in 1:dim(tax_table(merged))[1]){
if (is.na(tax_table(merged)[i,2])){
taxa_names(merged)[i] <- "Under 1%"
tax_table(merged)[i,1:6] <- "Under 1%"}
}
return(merged)
}
I didn't realize this was now implemented into DAtest. I will go ahead and give it a shot. Thanks for your help. Cheers!
Hello, I would like to create a 100% stacked bar plot for taxa collapsed to the genus level. I did it by using R to calculate the relative abundance at genus level, then picking up the top 20 taxa and extract genera with rel-ab >1% , then move to excel and copy these values as % and group the rest in others column. Actually, it gives me 100% bar graphs, but in publications they are using stacked bar chart generated in R. I found two solutions from github; first (#494) generated the chart but not displayed as 100% barchart. The second (#901) gave me weird graph, only one column and genera are not displayed as a fill to the column, plus when displaying the dataframe content, I found it empty, just have columns' headings??
May any please help me to solve this problem. Best