joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

conflicting results between Shannon, Simpson, and Fisher indices #800

Closed coyk closed 7 years ago

coyk commented 7 years ago

Hello, I am constructing boxplots and rarefaction curves of Shannon, Simpson, and Fisher indices boxplot rarefaction

The boxplot shows Fisher index values that are the opposite of Shannon and Simpson. The rarefaction curves based on Fisher indices didn't really work at all.

The code for the boxplots I used was library("phyloseq") library("ggplot2") library("scales") library("grid") theme_set(theme_bw()) library("DESeq2") Bathy_ps1_OTU_table <- import_mothur (mothur_shared_file = "/....unique_list.shared", mothur_constaxonomy_file ="/.....cons.taxonomy", cutoff=NULL) Bathy_ps1_sample_data <- data.frame(read.csv(".../Bathy_data.csv", header = TRUE, row.names = 1)) Bathy_ps1_sample_mapping <- sample_data(data.frame(Site = Bathy_ps1_sample_data["site"], Mitotype = Bathy_ps1_sample_data["mitotype"], Dive = Bathy_ps1_sample_data["dive"], row.names = sample_names(Bathy_ps1_OTU_table))) Bathy_ps1_exp <- merge_phyloseq(Bathy_ps1_OTU_table, Bathy_ps1_sample_mapping) Bathy_ps1_exp_trim <- prune_taxa(taxa_sums(Bathy_ps1_exp) > 1, Bathy_ps1_exp) plot_richness(Bathy_ps1_exp, x = "site", color = "dive", measures=c("Observed", "Shannon", "Simpson", "Fisher"), scales = "free_y", nrow = 1) + geom_boxplot()

For Rarefaction:

combining sample info and OTU_table, this is creating the phyloseq data object

Bathy_ps1_exp <- merge_phyloseq(Bathy_ps1_OTU_table, Bathy_ps1_sample_mapping) rare_curve_ps1 <- Bathy_ps1_exp rare_curve_ps1 sample_sums(rare_curve_ps1) set.seed(42) calculate_rarefaction_curves <- function(rare_curve_ps1, measures, depths) { require('plyr') # ldply require('reshape2') # melt

estimate_rarified_richness <- function(rare_curve_ps1, measures, depth) { if(max(sample_sums(rare_curve_ps1)) < depth) return() rare_curve_ps1 <- prune_samples(sample_sums(rare_curve_ps1) >= depth, rare_curve_ps1)

rarified_rare_curve_ps1 <- rarefy_even_depth(rare_curve_ps1, depth, verbose = FALSE)

alpha_diversity <- estimate_richness(rarified_rare_curve_ps1, measures = measures)

# as.matrix forces the use of melt.array, which includes the Sample names (rownames)
molten_alpha_diversity <- melt(as.matrix(alpha_diversity), varnames = c('Sample', 'Measure'), value.name = 'Alpha_diversity')

molten_alpha_diversity

}

names(depths) <- depths # this enables automatic addition of the Depth to the output by ldply rarefaction_curve_data <- ldply(depths, estimate_rarified_richness, rare_curve_ps1 = rare_curve_ps1, measures = measures, .id = 'Depth', .progress = ifelse(interactive(), 'text', 'none'))

convert Depth from factor to numeric

rarefaction_curve_data$Depth <- as.numeric(levels(rarefaction_curve_data$Depth))[rarefaction_curve_data$Depth]

rarefaction_curve_data }

rarefaction_curve_data <- calculate_rarefaction_curves(rare_curve_ps1, c('Observed', 'Shannon', 'Simpson','Fisher'), rep(c(1, 10, 100, 1000, 1:100 * 10000), each = 10)) summary(rarefaction_curve_data)

Originally I included Fisher but the resultant graph was really weird.

rarefaction_curve_data_summary <- ddply(rarefaction_curve_data, c('Depth', 'Sample', 'Measure'), summarise, Alpha_diversity_mean = mean(Alpha_diversity), Alpha_diversity_sd = sd(Alpha_diversity)) rarefaction_curve_data_summary_verbose <- merge(rarefaction_curve_data_summary, data.frame(sample_data(rare_curve_ps1)), by.x = 'Sample', by.y = 'row.names')

library('ggplot2')

ggplot( data = rarefaction_curve_data_summary_verbose, mapping = aes( x = Depth, y = Alpha_diversity_mean, ymin = Alpha_diversity_mean - Alpha_diversity_sd, ymax = Alpha_diversity_mean + Alpha_diversity_sd, colour = Sample, group = Sample ) ) + geom_line( ) + geom_pointrange( ) + facet_wrap( facets = ~ Measure, scales = 'free_y' )

joey711 commented 7 years ago

yep, doesn't look like the Fisher index worked on your data at all. Check the requirements of that index against your data to see why it failed.

gazollavolpiano commented 2 years ago

Hello! I had the same problem creating Fisher curves.

To correct this you should not compute Fisher when the sample has just singletons.