Closed zapratte closed 7 years ago
There are a few ways I can think of doing it - I'd say the best ways are to filter out Genus level OTU's in phyloseq and convert to DESeq or pull it from the data frame of the DESeq object (I believe this will be harder). Use subset_taxa to achieve this
assuming your data puts "NA" in your taxonomy table, this will work. replace this with "" blank characters or whatever your table uses
GenusFiltered <- subset_taxa(data, Genus != "NA" & Species == "NA")
I would suggest you agglomerate to the genus level, using the tax_glom
function
physeq = merge_phyloseq(biomfile, tree, map)
GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA")
physeqGenus = tax_glom(GenusFiltered, "Genus")
And then go ahead and do your analysis using the typical phyloseq code, etc.
Hi Joey, I just wanted to provide some quick feedback. When I agglomerate to the genus level, I run into some difficulty because anything unclassified or "other" agglomerates. So- if I have two different OTUs from two different Orders, yet are considered "Other" in each Order, they will be combined. The only way I've been able to use DESeq2 at the Genus level is to import a tabulated OTU file already at the Genus level without a .tre file, and parse taxonomy by hand after I get my results. I'm happy to provide more info if needed....
Hello @zapratte
if I have two different OTUs from two different Orders, yet are considered "Other" in each Order, they will be combined.
Are you sure this is happening? When I use tax_glom()
, it only combines at a given level if higher taxonomic levels match. For example:
> glom5 <- tax_glom(phyloseq, "Rank5")
> glom5.unknown <- subset_taxa(glom5, Rank5 == "f__")
> glom5.unknown
phyloseq-class experiment-level object
otu_table() OTU Table: [ 66 taxa and 2 samples ]
sample_data() Sample Data: [ 2 samples by 15 sample variables ]
tax_table() Taxonomy Table: [ 66 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 66 tips and 65 internal nodes ]
> head(tax_table(glom5.unknown))
Taxonomy Table: [6 taxa by 7 taxonomic ranks]:
Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7
OTU_705 "k__Bacteria" "p__Firmicutes" "c__Bacilli" "o__Bacillales" "f__" NA NA
OTU_1406 "k__Bacteria" "p__OP11" "c__OP11-3" "o__" "f__" NA NA
OTU_498 "k__Bacteria" "p__Bacteroidetes" "c__Bacteroidia" "o__Bacteroidales" "f__" NA NA
OTU_1032 "k__Bacteria" "p__Bacteroidetes" "c__Sphingobacteriia" "o__Sphingobacteriales" "f__" NA NA
OTU_1210 "k__Bacteria" "p__Bacteroidetes" "c__[Saprospirae]" "o__[Saprospirales]" "f__" NA NA
OTU_735 "k__Bacteria" "p__Cyanobacteria" "c__4C0d-2" "o__MLE1-12" "f__" NA NA
Notice that while I used tax_glom()
at Rank5, I still had many results with the exact same Rank5, but different Ranks 1-4.
What happens with your data?
I would suggest you agglomerate to the genus level, using the
tax_glom
functionphyseq = merge_phyloseq(biomfile, tree, map) GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA") physeqGenus = tax_glom(GenusFiltered, "Genus")
And then go ahead and do your analysis using the typical phyloseq code, etc.
GenusFiltered <- subset_taxa(physeq, !is.na(Genus) & is.na(Species)
if you get the error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
Got this using R 3.5
When I agglomerate to the the phylum level (or any other level for that matter) and run the Deseq analysis the output table still includes lower levels of taxonomy. Can I ignore these lower levels and only focus on the chosen level?
Hi, I am performing deseq2 to determine differently abundant genus across 3 rhizosphere samples. I tried comparing the first two species, however when I ran the codes I am getting an error. kostic = subset_samples(s5_working, SPECIES != "A,B") diagdds = phyloseq_to_deseq2(kostic, ~SPECIES) dds <- DESeq(diagdds,test="Wald", fitType="parametric") res <- results(dds, contrast=c("SPECIES","A","B")) alpha = 0.05 sigtab = res[which(res$padj < alpha), ] sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(kostic)[rownames(sigtab), ], "matrix"))
Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
Dear Gemthomasbarry,
I am getting the same error message - did you ever find a resolution to this?
Many thanks, Roshani
If you are getting the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" it likely means no taxa were identified as differentially abundant.
Dear gcuster1991, Thank you for your response, if this is the case do you have a recommendation for how I can show in some sort of statistical analysis that there is no taxa that were differentially abundant please? Many thanks, Roshani
You could change your alpha level to higher than 0.05 (or whatever you have it specified at) to see when you start to see taxa showing up.
Dear gcuster1991,
Thank you for this suggestion. I shall give this a go.
Bw
Roshani
If you are getting the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" it likely means no taxa were identified as differentially abundant.
Hello
Any advice on how to have a visualisation of Differential genes at the genus level rather than phylum? by making changes to the phyloseq object? making changes in the phyloseq object like aforementioned as a solution in this issue produce the following error in downstream analysis
physeq = merge_phyloseq(biomfile, tree, map)
GenusFiltered <- subset_taxa(physeq, Genus != "NA" & Species == "NA")
physeqGenus = tax_glom(GenusFiltered, "Genus")
"Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent"
although I have phylum taxa differential abundance
This isn't an issue per say, but I'm not entirely sure where to put this. We'd like to conduct analyses (particularly DESeq2 and heat maps) at the genus level, rather than the OTU level. I'm quite clumsy at R. My current workflow for DESeq2 is mostly bits and pieces taken from other scripts: map<-import_qiime_sample_data(file.choose()) tree<-read_tree(file.choose()) biomfile<-import_biom(file.choose(),parseFunction=parse_taxonomy_greengenes) data<-merge_phyloseq(biomfile,tree,map) deseqdata<-phyloseq_to_deseq2(data,~Microbiome) gm_mean<-function(x, na.rm=TRUE){exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))} geoMeans<-apply(counts(deseqdata),1,gm_mean) deseqdata<-estimateSizeFactors(deseqdata,geoMeans=geoMeans) deseqdata<-DESeq(deseqdata,fitType="local") res<-results(deseqdata,contrast=c("Microbiome","Factor1","Factor2")) alpha<-0.01 sigtab<-res[which(res$padj<alpha),] sigtab<-cbind(as(sigtab,"data.frame"),as(tax_table(data)[rownames(sigtab),],"matrix")) write.table(sigtab,"SidDifSpecies.txt",sep="\t")
Any help manipulating the data and/or script to analyze at the genus level would be greatly appreciated. Please excuse the clumsy post and code- it's my first one.