joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
578 stars 187 forks source link

Plot_heatmap "error in svd" #290

Closed kfontanez closed 10 years ago

kfontanez commented 10 years ago

Joey-

I am trying to make a heatmap of log2 fold changes in abundance among samples. I have done this successfully before but this particular dataset is having issues. The error suggests there are missing values or NAs in my otu_table, which there are not. The values are finite and present in the matrix. What is wrong?

Thanks, Kristina

czy phyloseq-class experiment-level object otu_table() OTU Table: [ 9 taxa and 3 samples ] tax_table() Taxonomy Table: [ 9 taxa by 3 taxonomic ranks ] otu_table(czy) OTU Table: [9 taxa and 3 samples] taxa are rows LvS.LFC DvS.LFC LvD.LFC Path1 -0.19 -0.20 0.01 Path2 0.21 0.18 0.04 Path3 0.07 -0.65 0.72 Path4 0.29 -0.27 0.56 Path5 -0.13 0.59 -0.73 Path6 0.70 -0.26 0.96 Path7 -3.25 -2.27 -0.98 Path8 -0.18 0.63 -0.81 Path9 -0.02 0.84 -0.86 tax_table(czy) Taxonomy Table: [9 taxa by 3 taxonomic ranks]: Class ShortDescription
Path1 "GT" "glycosyltransferases"
Path2 "GH" "Glycoside hydrolases"
Path3 "CE" "Carbohydrate esterases"
Path4 "AA" "Auxiliary Activities"
Path5 "CBM" "carbohydrate-binding modules" Path6 "PL" "Polysaccharide Lyases"
Path7 "cohesin" "Cellulosome"
Path8 "dockerin" "Cellulosome"
Path9 "SLH" "Cellulosome"
Description
Path1 "formation of glycosidic bonds"
Path2 "hydrolysis and/or rearrangement of glycosidic bonds"
Path3 "hydrolysis of carbohydrate esters"
Path4 "redox enzymes that act in conjunction with CAZymes"
Path5 "a contiguous amino acid sequence within a carbohydrate-active enzyme with a discreet fold having carbohydrate-binding activity" Path6 "non-hydrolytic cleavage of glycosidic bonds"
Path7 "The cohesin-dockerin interaction is the crucial interaction for complex formation in the cellulosome"
Path8 "The cohesin-dockerin interaction is the crucial interaction for complex formation in the cellulosome"
Path9 "S-layer homology domain, anchoring cellulosome onto the bacterial cell surfaces"

plot_heatmap(czy) Error in svd(x, nu = 0) : infinite or missing values in 'x' In addition: Warning messages: 1: In metaMDS(veganify(physeq), distance, ...) : 'comm' has negative data: 'autotransform', 'noshare' and 'wascores' set to FALSE 2: In distfun(comm, method = distance, ...) : results may be meaningless because data have negative entries in method “bray” 3: In cmdscale(dist, k = k) : only 0 of the first 2 eigenvalues are > 0 traceback() 14: stop("infinite or missing values in 'x'") 13: svd(x, nu = 0) 12: prcomp.default(sol$points) 11: prcomp(sol$points) 10: monoMDS(dist, y = cmdscale(dist, k = k), k = k, maxit = maxit, ...) 9: metaMDSiter(dis, k = k, trymax = trymax, trace = trace, plot = plot, previous.best = previous.best, engine = engine, ...) 8: metaMDS(veganify(physeq), distance, ...) 7: ordinate(physeq, method, distance, ...) 6: eval(expr, envir, enclos) 5: eval(expr, pf) 4: withVisible(eval(expr, pf)) 3: evalVis(expr) 2: capture.output(ps.ord <- ordinate(physeq, method, distance, ...), file = NULL) 1: plot_heatmap(czy) is.na(otu_table(czy)) LvS.LFC DvS.LFC LvD.LFC Path1 FALSE FALSE FALSE Path2 FALSE FALSE FALSE Path3 FALSE FALSE FALSE Path4 FALSE FALSE FALSE Path5 FALSE FALSE FALSE Path6 FALSE FALSE FALSE Path7 FALSE FALSE FALSE Path8 FALSE FALSE FALSE Path9 FALSE FALSE FALSE is.infinite(otu_table(czy)) LvS.LFC DvS.LFC LvD.LFC Path1 FALSE FALSE FALSE Path2 FALSE FALSE FALSE Path3 FALSE FALSE FALSE Path4 FALSE FALSE FALSE Path5 FALSE FALSE FALSE Path6 FALSE FALSE FALSE Path7 FALSE FALSE FALSE Path8 FALSE FALSE FALSE Path9 FALSE FALSE FALSE is.finite(otu_table(czy)) LvS.LFC DvS.LFC LvD.LFC Path1 TRUE TRUE TRUE Path2 TRUE TRUE TRUE Path3 TRUE TRUE TRUE Path4 TRUE TRUE TRUE Path5 TRUE TRUE TRUE Path6 TRUE TRUE TRUE Path7 TRUE TRUE TRUE Path8 TRUE TRUE TRUE Path9 TRUE TRUE TRUE

joey711 commented 10 years ago

I don't recommend trying NMDS, or any sample-ordination, with only 3 samples. You should probably just inspect the distance values directly. I can't verify that it is the only problem, since I don't have your data, but I am not surprised that NMDS complains.

joey711 commented 10 years ago

Oh, I see. The ordination is being done implicitly within the plot_heatmap function by default. This is clearly explained in its documentation. If you just want to display the heatmap, but don't want to use any clustering technique to order the axes, you can specify the axis ordering explicitly with options to the plot_heatmap function, and no ordination will be attempted internally.

I could partially reproduce your issue on example data, but only get warnings. I think there is an even more severe issue with your data in this case. Some all-zero entries, or maybe one of the samples is exactly the same as another? It is impossible for me to troubleshoot without a reproducible example of your issue, which always means providing/specifying a dataset. The additional errors you've shown do not result from any of the calls to plot_heatmap that I have tried.

Here is an example at least illustrating the problem of using NMDS-ordering with only two samples...

Reproducible example, starting from data

Partially reproduce the issue with just ordination, and then with plot_heatmap, using an arbitrary subset of the GlobalPatterns dataset.

library("phyloseq")
data("GlobalPatterns")
# Keep only 3 large samples that have about the same library size, for
# convenience
keep3samples = c("LMEpi24M", "M11Fcsw", "M31Tong")
gp3 = prune_samples(keep3samples, GlobalPatterns)
# Keep the most abundant 40 OTUs in this subset.
gp3 = prune_taxa(names(sort(taxa_sums(gp3), TRUE)[1:40]), gp3)
gp3
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 40 taxa and 3 samples ]
## sample_data() Sample Data:       [ 3 samples by 7 sample variables ]
## tax_table()   Taxonomy Table:    [ 40 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 40 tips and 39 internal nodes ]
ordinate(gp3, "NMDS", "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0 
## Run 1 stress 0 
## ... procrustes: rmse 0.1831  max resid 0.2345 
## Run 2 stress 0 
## ... procrustes: rmse 0.3703  max resid 0.4948 
## Run 3 stress 0 
## ... procrustes: rmse 0.1692  max resid 0.2085 
## Run 4 stress 0 
## ... procrustes: rmse 0.1849  max resid 0.23 
## Run 5 stress 0 
## ... procrustes: rmse 0.152  max resid 0.1892 
## Run 6 stress 0 
## ... procrustes: rmse 0.1075  max resid 0.1271 
## Run 7 stress 0 
## ... procrustes: rmse 0.02285  max resid 0.02293 
## Run 8 stress 0 
## ... procrustes: rmse 0.1813  max resid 0.2318 
## Run 9 stress 0 
## ... procrustes: rmse 0.03845  max resid 0.04173 
## Run 10 stress 0 
## ... procrustes: rmse 0.3805  max resid 0.486 
## Run 11 stress 0 
## ... procrustes: rmse 0.3464  max resid 0.4863 
## Run 12 stress 0 
## ... procrustes: rmse 0.05561  max resid 0.05975 
## Run 13 stress 0 
## ... procrustes: rmse 0.1767  max resid 0.2157 
## Run 14 stress 0 
## ... procrustes: rmse 0.3144  max resid 0.4322 
## Run 15 stress 0 
## ... procrustes: rmse 0.382  max resid 0.4697 
## Run 16 stress 0 
## ... procrustes: rmse 0.06563  max resid 0.07325 
## Run 17 stress 0 
## ... procrustes: rmse 0.0436  max resid 0.04735 
## Run 18 stress 0 
## ... procrustes: rmse 0.136  max resid 0.1589 
## Run 19 stress 0 
## ... procrustes: rmse 0.2962  max resid 0.3806 
## Run 20 stress 0 
## ... procrustes: rmse 0.2849  max resid 0.356
## 
## Call:
## metaMDS(comm = veganify(physeq), distance = distance) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(veganify(physeq))) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0 
## Stress type 1, weak ties
## No convergent solutions - best solution after 20 tries
## Scaling: centring, PC rotation 
## Species: expanded scores based on 'wisconsin(sqrt(veganify(physeq)))'
plot_heatmap(gp3, taxa.label = "Genus")

unnamed-chunk-1

Solution to warning

The warning preceding the previous plot is expected because an ordination should not be attempted with only 3 samples. You can avoid this error and ordination altogether by various options in the plot_heatmap function. Here I will give the precise order of the samples to display, and then order the taxa by their phylum name.

plot_heatmap(gp3, taxa.label = "Genus", sample.order = sample_names(gp3), taxa.order = "Phylum")

solution1

plot_heatmap(gp3, taxa.label = "Genus", sample.label = "SampleType", sample.order = sample_names(gp3), 
    taxa.order = "Phylum")

solution2

I will close this issue for now unless/until further problems can be verified with a reproducible example. Hope this explanation helps solve at least part of the problem.

joey

kfontanez commented 10 years ago

Joey-

Thank you for your work to help resolve this issue. Explicitly setting the sample order and taxa order actually did fix my problem. Avoiding the ordination altogether was definitely the solution.

Thank you! Kristina