joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
576 stars 187 forks source link

Tip label are sequence rather than taxonomy assignment when use tree for ITOL #780

Open zippymimosa opened 7 years ago

zippymimosa commented 7 years ago

Hi,

I generate a phyloseq file following the Bioconductor Workflow for Microbiome Data Analysis (https://f1000research.com/articles/5-1492/v2). Tree construction was done using dada2 and phangorn package. Then I move onto phyloseq and I can plot trees using the plot_tree function with taxonomic information as label tip .

Now I want to use the ITOL to generate more fancy trees. So I save the tree using ape::write.tree function to extract the tree from the ps file and load the tree to the ITOL. However, the label tips are not taxonomic assignments, but are the sequence, like ATGCCCCC.

In the bioconductor workflow, the tree was constructed using the following codes: seqs <- getSequences(seqtab) names(seqs) <- seqs # This propagates to the tip labels of the tree alignment <- AlignSeqs(DNAStringSet(seqs), anchor=NA)

I check seqs, which are supposed to show up as tip labels and they are indeed sequence rather than taxa, which may explain why they were used as tip labels in the ITOL. With plot_tree in phyloseq everything is fine, why those taxa information got lost when exported using the ape function and when import into ITOL?

Anyone knows why and how to fix that?

Thanks a lot!

Jing

rhockney commented 7 years ago

Hello Jing, I am having the same problem when following the Bioconductor Workflow. Have you managed to resolve the issue? Rochelle

spholmes commented 7 years ago

Hi Rochelle and Jing, You will need to rename your tips something shorter before putting them into the phylogenetic tree program. The naming of tips by their sequence is a feature explained in detail here: http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2017119a.html a suggested approach if you need to change the names to something shorter is to use:

library(dplyr) long_names <- names(seq)

keep the long names for later.

short_names<-substr(names(seq), 1, 5)%>% make.names(unique = TRUE) names(seq ) <- short_names see also http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html

On Wed, Aug 9, 2017 at 2:19 AM, rhockney notifications@github.com wrote:

Hello Jing, I am having the same problem when following the Bioconductor Workflow. Have you managed to resolve the issue? Rochelle

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/780#issuecomment-321201388, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvbl0TSlOj4hKtjX7iyP2Qda-sKLIks5sWXmTgaJpZM4OBbqr .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

hrogal commented 7 years ago

I think the question is how to show (not change) the taxa label assigned by taxonomic assignments when working with the phyloseq object. I think Joey's reply on May 20 2013 here (https://github.com/joey711/phyloseq/issues/213) covers what you (and I) were looking for. However, the point about hardcoded ASVs is really important. You (and everyone else) should be able to go the ASV sequence to check if the OTU assignment you are showing is the same as what they have.