joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Plot bar chart with the top 10 predomenat classes #1505

Open LeonardosMageiros opened 3 years ago

LeonardosMageiros commented 3 years ago

Hi!

I have a taxonomic table with the read counts of 14 samples mapped down to the species level. Below is a small part of this table:

image

These samples are grouped in 4 different categories based on their locations.

What I would like to do is to plot a barplot that shows the the changes in the 10 most abundant Classes in my data.

The first thing I do is to normalise my counts like so:

total = median(sample_sums(pseq)) standf = function(x, t=total) round(t * (x / sum(x))) pseq = transform_sample_counts(pseq, standf)

Then following a tutorial I found online I do the following:

TopNOTUs <- names(sort(taxa_sums(pseq), TRUE)[1:300]) taxtab10 = cbind(tax_table(pseq), genus_10 = NA) taxtab10[TopNOTUs, "genus_10"] <- as(tax_table(pseq)[TopNOTUs, "Class"], "character") tax_table(pseq) <- tax_table(taxtab10) column <- "Group"

Nonetheless if I understand correctly this code it just keeps my 300 most abundant species which in my case correspond to ~15 Classes.

Plotting this using the following command: plot_bar(pseq, fill="genus_10") + facet_wrap(as.formula(paste("~", column)), scales="free_x", nrow=1) + geom_bar(aes(color=genus_10, fill=genus_10), stat="identity", position="stack", show.legend = FALSE) + theme(axis.text.x=element_blank()) + theme( legend.title = element_text(size = 12), legend.text = element_text(size = 8) )

Gives me that figure:

image

Nonetheless this is not what I want to do because y axis is not relative abundance, I have not actually the 10 most predominant classes but the classes that correspond to my 300 most predominant species, and ideally I would like to have the NA gone.

I was able to summarize my groups using merge_samples and create percentages using the following

pseq_fraction <- merge_samples(pseq, column) sample_data(pseq_fraction)$column <- levels(sample_data(pseq_fraction)$column) pseq_fraction = transform_sample_counts(pseq_fraction, function(x) 100 * x/sum(x))

To produce using the following plot function the following: plot_bar(pseq_fraction, fill = "genus_10") + geom_bar(aes(color=genus_10, fill=genus_10), stat="identity", position="stack") image

So my main question is if there is a way to correctly find my top 10 predominant Classes based on all my species and plot those in a plot similar to my last one? Would it be possible to remove the percentage of the NA in the same plot as well?

Thank you in advance and apologies for the long post.

Best Leonardos

vee3capp commented 2 years ago

This is the gplot code I used and I made a less than 5% category

plot<-transform_sample_counts(m.aa, function(x) 100 * x/sum(x)) glom <- tax_glom(plot, taxrank = 'class') glom # should list taxa as phyla data_glom<- psmelt(glom) # create dataframe from phyloseq object

data_glom$class <- as.character(data_glom$class) #convert to character

simple way to rename phyla with < 5% abundance

data_glom$class[data_glom$Abundance < 0.05] <- "< 5% abundance"

Count phyla to set color palette

Count = length(unique(data_glom$class)); Count unique(data_glom$class) #add into line below data_glom$class <- factor(data_glom$class, levels = c("Bacilli", "Clostridia", "Gammaproteobacteria", "Bacteroidia","Betaproteobacteria", "Actinobacteria", "Flavobacteriia", "Erysipelotrichi",
"Coriobacteriia", "Fusobacteriia", "Epsilonproteobacteria", "Alphaproteobacteria" , "Spirochaetes", "< 5% abundance"))

spatial_plot <- ggplot(data_glom, aes(x=Sample, y=Abundance, fill=class)) + facet_grid(~decomp.stage, scales = "free")

spatial_plot + geom_bar(aes(), stat="identity", position="stack") + labs (title = "Abdominal Area", x="Stage of Decomposition", y= "Relative Abundance", ) + scale_fill_manual(values = c("springgreen3","royalblue4","gold1","darkorchid","tomato3","deeppink","firebrick","saddlebrown","cyan2","darkseagreen2","salmon2", "lemonchiffon1","darkorange2","gray47","navy","plum","lightblue4","black","cornsilk2"))+ theme(legend.position="bottom", plot.title = element_text(hjust=0.5)) + guides(fill=guide_legend(nrow=4))