joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Analysis (DESeq2) at the genus level (rather than OTU) #683

Closed zapratte closed 7 years ago

zapratte commented 7 years ago

This isn't an issue per say, but I'm not entirely sure where to put this. We'd like to conduct analyses (particularly DESeq2 and heat maps) at the genus level, rather than the OTU level. I'm quite clumsy at R. My current workflow for DESeq2 is mostly bits and pieces taken from other scripts: map<-import_qiime_sample_data(file.choose()) tree<-read_tree(file.choose()) biomfile<-import_biom(file.choose(),parseFunction=parse_taxonomy_greengenes) data<-merge_phyloseq(biomfile,tree,map) deseqdata<-phyloseq_to_deseq2(data,~Microbiome) gm_mean<-function(x, na.rm=TRUE){exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))} geoMeans<-apply(counts(deseqdata),1,gm_mean) deseqdata<-estimateSizeFactors(deseqdata,geoMeans=geoMeans) deseqdata<-DESeq(deseqdata,fitType="local") res<-results(deseqdata,contrast=c("Microbiome","Factor1","Factor2")) alpha<-0.01 sigtab<-res[which(res$padj<alpha),] sigtab<-cbind(as(sigtab,"data.frame"),as(tax_table(data)[rownames(sigtab),],"matrix")) write.table(sigtab,"SidDifSpecies.txt",sep="\t")

Any help manipulating the data and/or script to analyze at the genus level would be greatly appreciated. Please excuse the clumsy post and code- it's my first one.

StackerEd commented 7 years ago

There are a few ways I can think of doing it - I'd say the best ways are to filter out Genus level OTU's in phyloseq and convert to DESeq or pull it from the data frame of the DESeq object (I believe this will be harder). Use subset_taxa to achieve this

assuming your data puts "NA" in your taxonomy table, this will work. replace this with "" blank characters or whatever your table uses

GenusFiltered <- subset_taxa(data, Genus != "NA" & Species == "NA")

joey711 commented 7 years ago

I would suggest you agglomerate to the genus level, using the tax_glom function

physeq = merge_phyloseq(biomfile, tree, map)
GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA")
physeqGenus = tax_glom(GenusFiltered, "Genus")

And then go ahead and do your analysis using the typical phyloseq code, etc.

zapratte commented 7 years ago

Hi Joey, I just wanted to provide some quick feedback. When I agglomerate to the genus level, I run into some difficulty because anything unclassified or "other" agglomerates. So- if I have two different OTUs from two different Orders, yet are considered "Other" in each Order, they will be combined. The only way I've been able to use DESeq2 at the Genus level is to import a tabulated OTU file already at the Genus level without a .tre file, and parse taxonomy by hand after I get my results. I'm happy to provide more info if needed....

colinbrislawn commented 7 years ago

Hello @zapratte

if I have two different OTUs from two different Orders, yet are considered "Other" in each Order, they will be combined.

Are you sure this is happening? When I use tax_glom(), it only combines at a given level if higher taxonomic levels match. For example:

> glom5 <- tax_glom(phyloseq, "Rank5")
> glom5.unknown <- subset_taxa(glom5, Rank5 == "f__")
> glom5.unknown
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 66 taxa and 2 samples ]
sample_data() Sample Data:       [ 2 samples by 15 sample variables ]
tax_table()   Taxonomy Table:    [ 66 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 66 tips and 65 internal nodes ]
> head(tax_table(glom5.unknown))
Taxonomy Table:     [6 taxa by 7 taxonomic ranks]:
         Rank1         Rank2              Rank3                 Rank4                   Rank5 Rank6 Rank7
OTU_705  "k__Bacteria" "p__Firmicutes"    "c__Bacilli"          "o__Bacillales"         "f__" NA    NA   
OTU_1406 "k__Bacteria" "p__OP11"          "c__OP11-3"           "o__"                   "f__" NA    NA   
OTU_498  "k__Bacteria" "p__Bacteroidetes" "c__Bacteroidia"      "o__Bacteroidales"      "f__" NA    NA   
OTU_1032 "k__Bacteria" "p__Bacteroidetes" "c__Sphingobacteriia" "o__Sphingobacteriales" "f__" NA    NA   
OTU_1210 "k__Bacteria" "p__Bacteroidetes" "c__[Saprospirae]"    "o__[Saprospirales]"    "f__" NA    NA   
OTU_735  "k__Bacteria" "p__Cyanobacteria" "c__4C0d-2"           "o__MLE1-12"            "f__" NA    NA   

Notice that while I used tax_glom() at Rank5, I still had many results with the exact same Rank5, but different Ranks 1-4.

What happens with your data?

jsan4christ commented 5 years ago

I would suggest you agglomerate to the genus level, using the tax_glom function

physeq = merge_phyloseq(biomfile, tree, map)
GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA")
physeqGenus = tax_glom(GenusFiltered, "Genus")

And then go ahead and do your analysis using the typical phyloseq code, etc.

GenusFiltered <- subset_taxa(physeq, !is.na(Genus) & is.na(Species)

if you get the error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

Got this using R 3.5

gcuster1991 commented 5 years ago

When I agglomerate to the the phylum level (or any other level for that matter) and run the Deseq analysis the output table still includes lower levels of taxonomy. Can I ignore these lower levels and only focus on the chosen level?

Gemthomasbarry commented 4 years ago

Hi, I am performing deseq2 to determine differently abundant genus across 3 rhizosphere samples. I tried comparing the first two species, however when I ran the codes I am getting an error. kostic = subset_samples(s5_working, SPECIES != "A,B") diagdds = phyloseq_to_deseq2(kostic, ~SPECIES) dds <- DESeq(diagdds,test="Wald", fitType="parametric") res <- results(dds, contrast=c("SPECIES","A","B")) alpha = 0.05 sigtab = res[which(res$padj < alpha), ] sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(kostic)[rownames(sigtab), ], "matrix"))

Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

roshanipatel12 commented 3 years ago

Dear Gemthomasbarry,

I am getting the same error message - did you ever find a resolution to this?

Many thanks, Roshani

gcuster1991 commented 3 years ago

If you are getting the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" it likely means no taxa were identified as differentially abundant.

roshanipatel12 commented 3 years ago

Dear gcuster1991, Thank you for your response, if this is the case do you have a recommendation for how I can show in some sort of statistical analysis that there is no taxa that were differentially abundant please? Many thanks, Roshani

gcuster1991 commented 3 years ago

You could change your alpha level to higher than 0.05 (or whatever you have it specified at) to see when you start to see taxa showing up.

roshanipatel12 commented 3 years ago

Dear gcuster1991,

Thank you for this suggestion. I shall give this a go.

Bw

Roshani

marwa38 commented 2 years ago

If you are getting the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" it likely means no taxa were identified as differentially abundant.

Hello

Any advice on how to have a visualisation of Differential genes at the genus level rather than phylum? by making changes to the phyloseq object? making changes in the phyloseq object like aforementioned as a solution in this issue produce the following error in downstream analysis

physeq = merge_phyloseq(biomfile, tree, map)
GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA")
physeqGenus = tax_glom(GenusFiltered, "Genus")

"Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" although I have phylum taxa differential abundance image