joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
571 stars 187 forks source link

Problem with tree - labeling nodes #1081

Open lara-whoi opened 5 years ago

lara-whoi commented 5 years ago

Hello - I am working on analyzing alpha and beta diversity of a dataset. I have imported data from DADA2, where I attempted to create a ML tree. My problem is that I believe I have an ML tree with the full sequences as tip labels. Could you suggest what step I might have missed. Second, I have many ASVs of very low abundance. Would you have any suggestions on pruning taxa, beyond getting rid of any that aren't present in any samples.

Thank you!

fit = pml(treeNJ, data = phang.align)
fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <- optim.pml(fitGTR, model = "GTR", optInv = TRUE, optGamma = TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))

set.seed(711) phy_tree(ps) <-root(phy_tree(ps), sample(taxa_names(ps),1), resolve.root = TRUE) is.rooted(phy_tree(ps)) [1] TRUE

ps phyloseq-class experiment-level object otu_table() OTU Table: [ 2662 taxa and 16 samples ] tax_table() Taxonomy Table: [ 2662 taxa by 6 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 2662 tips and 2660 internal nodes ]

ntaxa(ps) [1] 2662 nsamples(ps) [1] 16 sample_names(ps) [1] "D.0.3.6.C" "D.0.3.6.O" "D.100.3.24.C" "D.100.3.24.O" "D.100.3.6.C" "D.100.3.6.O" "D.50.3.6.C"
[8] "D.50.3.6.O" "R.0.3.6.C" "R.0.3.6.O" "R.100.3.24.C" "R.100.3.24.O" "R.100.3.6.C" "R.100.3.6.O" [15] "R.50.3.6.C" "R.50.3.6.O"
rank_names(ps) [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus"

phy_tree(ps)

Phylogenetic tree with 2662 tips and 2661 internal nodes.

Tip labels: CGTTACTCGGAATCACTGGGCGTAAAGAGCATGTAGGCTGGTTTGTAAGTTGGAAGTGAAATCCTATGGCTCAACCATAGAACTGCTTCCAAAACTACATACCTAGAGTATGGGAGAGGTAGATGGAATTTCTGGTGTAGGGGTAAAATCCGTAGAGATCAGAAGGAATACCGATTGCGAAGGCGATCTACTGGAACATTACTGACGCTGAGATGCGAAAGCGTGGGGAGCA

mikemc commented 5 years ago

Regarding the long tip-labels, try doing what I suggest here to store the ASV sequences in the phyloseq object and then create shorter taxon names (after adding the tree to the phyloseq object, as you've done above).