joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Failure to merge files into a phyloseq object after renaming taxa, and sequences #1742

Closed emankhalaf closed 2 months ago

emankhalaf commented 2 months ago

Hi,

I am working on PacBio 16S sequences, using DADA2 for ASV inferences. I already have the phyloseq object using this code" ps <- phyloseq(tax_table(tax), sample_data(samdf), otu_table(st.nochim, taxa_are_rows = FALSE), phy_tree(fitGTR$tree)) But I need to combine the files following the sequences replacement in tax_table (rows) with ASVs, and similarly for the columns in the otu_table. So, now I have rownames in the tax_table as ASV1,2,3, and column names of otu_table as ASV1,2,3,.....Then, when I used the same code above to merge files, I got this error :

"Error in validObject(.Object) : invalid class “phyloseq” object: Component taxa/OTU names do not match. Taxa indices are critical to analysis. Try taxa_names()" Error : object 'x1' not found stop(msg, ": ", errors, domain = NA) validObject(.Object) initialize(value, ...) initialize(value, ...) new(Class = "phyloseq", tax_table = new("taxonomyTable", .Data = structure(c("Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria", ... do.call("new", c(list(Class = "phyloseq"), splatlist)) do.call("new", c(list(Class = "phyloseq"), splatlist)) phyloseq(tax_table(tax.export2), sample_data(samdf), otu_table(st.nochim.export2, taxa_are_rows = FALSE), phy_tree(fitGTR$tree))

So, I converted files into matrices and did the following:

setdiff(rownames(tax.export2), colnames(st.nochim.export2))
character(0)

all(colnames(st.nochim.export2) == rownames(tax.export2))
TRUE

ASV_matrix = otu_table(as.matrix(st.nochim.export2), taxa_are_rows=F) 
tax_matrix = tax_table(as.matrix(tax.export2)) 

meta_matrix = as.matrix(samdf)

setdiff(taxa_names(ASV_matrix), taxa_names(tax_matrix)) 
character(0)

all(rownames(ASV_matrix) == unlist(meta_matrix[,'SampleID']))
TRUE

ps_rename <- phyloseq(tax_table(tax_matrix),
                 sample_data(meta_matrix),
                 otu_table(ASV_matrix , taxa_are_rows = FALSE),

But I got this error: Error in access(object, "sam_data", errorIfNULL) : sam_data slot is empty. stop(slot, " slot is empty.") access(object, "sam_data", errorIfNULL) sample_data(meta_matrix) sample_data(meta_matrix) hyloseq(tax_table(tax_matrix), sample_data(meta_matrix), otu_table(ASV_matrix, taxa_are_rows = FALSE), phy_tree(fitGTR$tree))

Your help is appreciated!

benjjneb commented 2 months ago

Did you get a working phyloseq object to begin with, using the ASV sequences as identifiers (as are used in the dada2 R package)? If so, you can store the ASV sequences in the refseq slot of the ps object, and rename the taxa_names in the working ps object. See code below, from the dada2 tutorial hand-off to phyloseq section:

dna <- Biostrings::DNAStringSet(taxa_names(ps))
names(dna) <- taxa_names(ps)
ps <- merge_phyloseq(ps, dna)
taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps)))
ps
emankhalaf commented 2 months ago

Worked perfectly! Thanks so much!