joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Differential Abundance Analysis #1434

Open roshanipatel12 opened 3 years ago

roshanipatel12 commented 3 years ago

I am doing some differential analysis of 16s rRNA data and following code in this vignette: https://joey711.github.io/phyloseq-extensions/DESeq2.htm

sigtab <- res[(res$padj < alpha), ]
sigtab <- cbind(as(sigtab, "data.frame"), as(tax_table(ps2)[rownames(sigtab),], "matrix"))

However I get this error message: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I would be grateful for any help regarding this. I have seen that it has been discussed in some other chats. One suggestion was that "it likely means no taxa were identified as differentially abundant". If this is the case I would be keen to know if anyone has suggestions for how to show this.

Many thanks in abundance.

Roshani

ycl6 commented 3 years ago

@roshanipatel12 Did you use ps2 as your input for phyloseq_to_deseq2?

You can use nrow(as(sigtab, "data.frame")) and nrow(as(tax_table(ps2)[rownames(sigtab), ], "matrix")) to check if the rows counts are the same

Below I used the GlobalPatterns dataset as an example:

library(phyloseq)
library(DESeq2)

data(GlobalPatterns)

pruned <- subset_samples(GlobalPatterns, SampleType != "Mock")

sample_data(pruned)$Type <- "Environmental"
sample_data(pruned)[sample_data(pruned)$SampleType %in% c("Feces","Skin","Tongue"),]$Type <- "Human"

diagdds = phyloseq_to_deseq2(pruned, ~ Type)
gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds = DESeq(diagdds, fitType="local")

res = results(diagdds)
res = res[order(res$padj, na.last=NA),]
sigtab = res[(res$padj < 0.01),]
# Check how many taxa has padj less than 0.01
> table(res$padj < 0.01)

FALSE  TRUE 
17692   875 

# Check sigtab row counts
> nrow(as(sigtab, "data.frame"))
[1] 875

# Check subsetted tax_table row counts
> nrow(as(tax_table(pruned)[rownames(sigtab), ], "matrix"))
[1] 875
# Combine
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(pruned)[rownames(sigtab), ], "matrix"))

> head(sigtab)
       baseMean log2FoldChange    lfcSE       stat       pvalue         padj  Kingdom          Phylum               Class             Order             Family           Genus                    Species
470178 25.30546       24.29880 2.117840  11.473388 1.794923e-30 3.332633e-26 Bacteria  Actinobacteria      Actinobacteria   Actinomycetales Corynebacteriaceae Corynebacterium Corynebacteriumaurimucosum
25165  24.43550       24.25251 2.263361  10.715262 8.631259e-27 8.012829e-23 Bacteria   Bacteroidetes         Bacteroidia     Bacteroidales     Bacteroidaceae     Bacteroides                       <NA>
215700 14.50001      -22.40502 2.169335 -10.328062 5.261292e-25 3.256213e-21 Bacteria   Acidobacteria                <NA>              <NA>               <NA>            <NA>                       <NA>
74059  27.97749       24.44377 2.439478  10.020081 1.244010e-23 5.774385e-20 Bacteria    Fusobacteria        Fusobacteria   Fusobacteriales   Fusobacteriaceae    Leptotrichia                       <NA>
469903 10.18470       23.03120 2.507301   9.185655 4.090296e-20 1.175106e-16 Bacteria  Proteobacteria Gammaproteobacteria Cardiobacteriales Cardiobacteriaceae Cardiobacterium     Cardiobacteriumhominis
577212 19.71683      -22.82875 2.483045  -9.193851 3.790223e-20 1.175106e-16 Bacteria Verrucomicrobia            Opitutae   Puniceicoccales   Puniceicoccaceae            <NA>                       <NA>