joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
586 stars 187 forks source link

Importing sequence file to existing phyloseq object #1591

Closed act837 closed 2 years ago

act837 commented 2 years ago

Hi

Is there a way to add a ASV sequence table to an existing phyloseq object?

I know that this is possible when making a new phyloseq object if you start with a count table where the row names are sequences, and you want to replace these with ASV names, and store the sequences in refseq within the phyloseq object. See the code below.

dna <- Biostrings::DNAStringSet(taxa_names(ps)) names(dna) <- taxa_names(ps) ps <- merge_phyloseq(ps, dna) taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps))) ps

However, I use DADA2 to process sequencing data, and I have modified the DADA2 workflow to give a count table with ASVs already as the row names.

I have a fasta file of ASV number and sequences. See extract below for how this is formatted.

ASV_1 TACGGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGGGTCCGCAGGTGGCATTGTAAGTCTGCTGTTAAAGAGTTTGGCTCAACCAAATAAAAGCAGTGGAAACTACAAAGCTAGAGTTTGGTCGGGGCAGAGGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAAGAACACCAGTGGCGAAGGCGCTCTGCTAGGCCGAGACTGACACTGAGGGACGAAAGCTAGGGGAGCGAATGGG ASV_2 TACATAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGTCGTTTGTTACGTCGGATGTGAAAACCTGAGGCTCAACCTCAGGCCTGCATTCGATACGGGCAAACTAGAGTTTGGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCAATGGCGAAGGCAGGTCTCTGGGCCAATACTGACACTGAGGAGCGAAAGTCTGGGGAGCGAACAGG ASV_3 TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCGTTCGAAACTGCAAGGCTAGAGTGTGTCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCAATGGCGAAGGCAGCCCCCTGGGATAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGG

Is it possible to add this sequence table to refseq within a phyloseq object without going back to DADA2 to change my output files?

I hope I'm asking this question in the correct place - please let me know if not.

Thanks!

act837 commented 2 years ago

Figured it out in case anybody has the same question - just needed to import the fasta file properly with readDNAStringSet().

seqtab<-readDNAStringSet("sequences.fa", format="fasta")

TAX<-tax_table(tax) ASV<-otu_table(counts) samples<-sample_data(meta) ps<-phyloseq(ASV,TAX,samples,seqtab)

sghignone commented 2 months ago

Hi, any idea about how to prune_taxa based on sequence lengths (width, using DNAStringSet naming schema) ? SG