joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

file path problem, and tree tip label format problem #681

Closed Maya0801 closed 7 years ago

Maya0801 commented 7 years ago

Hi,

I am new to phyloseq, and I am having problem in importing the tree file.

The tree was constructed using qiime: make_phylogeny.py -i 6_align/seqs_derep_nochim_repotus_aligned_pfiltered.fasta -o 6_align/rep_set.tre --root_method tree_method_default --tree_method fasttree

and when I run the command:

tree <- read_tree("rep_set.tre", errorIfNULL=TRUE)

I get the error:

Error in read_tree("rep_set.tre", : tree file could not be read. Please retry with valid tree.

The tree doesn't upload also when I try to load it together with the biom file:

data <- import_biom("otu_table_blastn_s123_2s3t_nochl.biom", treefilename = "rep_set.tre")

Warning message: In import_biom("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/otu_table_blastn_s123_2s3t_nochl.biom", : treefilename failed import. It not included.

PS: I used silva for taxonomic assingment, should I specify this somewhere in the commands?

Thanks

dadahan commented 7 years ago

Hi, running the commands on a fasttree .tre file, I did not receive this error but when providing the incorrect filename/path I did. Are you sure you're providing the correct relative or absolute path and filename?

For example,

tree = read_tree("path/to/my/data/rep_set.tre")

If this doesn't work, you may want to give the ape package a try:

tree = ape::read.tree("path/to/my/data/rep_set.tre")
Maya0801 commented 7 years ago

I tried both:

tree = read_tree("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/rep_set.tre")

tree = ape::read.tree("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/rep_set.tre")

in the latter I got the error: Error in if (tp[3] != "") obj$node.label <- tp[3] : missing value where TRUE/FALSE needed

dadahan commented 7 years ago

You only need one to work. Did the first run without errors? If so, see what happens when that is imported with your biom table.

tree = read_tree("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/rep_set.tre")
Maya0801 commented 7 years ago

Hi Dylan, Thank you for your quick reply.

neither of them worked, when I ran:

tree = ape::read.tree("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_ampliconData/rep_set.tre")

I got the error: Error in if (tp[3] != "") obj$node.label <- tp[3] : missing value where TRUE/FALSE needed

When I ran:

tree = read_tree("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/rep_set.tre")

No error but the tree is NULL

dadahan commented 7 years ago

Alright, after some digging around it looks like the issue may be due to semi-colon delimitation of taxa names on the tips of a GreenGenes formatted tree. See here: #224 and here: 04dc433 and here: [Somewhat flexible tree-import function](Somewhat flexible tree-import function)

Try this:

tree = read_tree_greengenes("C:/Users/Maya/OneDrive for Business/PhD/מחקר/Theonella/16S_amplicon/Data/rep_set.tre")

Hope this helps. Happy to keep trying fixes.

dadahan commented 7 years ago

Just for clarity, this is a new issue and the initial one was fixed by providing the correct path.

Maya0801 commented 7 years ago

Hey Dylan,

Thanks for your help. that help the upload of the tree. However, I have another issue, because when we removed the semi colon from the tree the taxa names don't match up:

> taxa_names(tree)[1:10] [1] "'Maya5-196_26247><-><-><'" "'Maya18-339_2898><-><-><'"
[3] "'Maya56-N1-RNA_155><-><-><'" "'Maya19-352_13306><-><-><'" [5] "'Maya17-322_18014><-><-><'" "'Maya18-339_8960><-><-><'"
[7] "'Maya7-211_19062><-><-><'" "'Maya12-249_13377><-><-><'" [9] "'Maya7-211_16386><-><-><'" "'Maya11-248_4919><-><-><'"
> taxa_names(biomot)[1:10] [1] "Maya10-232_755;size=18702;" "Maya10-232_1049;size=12348;" "Maya10-232_708;size=18179;" [4] "Maya10-232_5044;size=22289;" "Maya10-232_228;size=17322;" "Maya10-232_6708;size=501;"
[7] "Maya10-232_266;size=10586;" "Maya10-232_4822;size=25616;" "Maya10-232_1196;size=8372;" [10] "Maya11-248_3276;size=3672;"

Should I convert the taxa names in the biom file? if yes, do you know a way to do it?

Thanks you so much for your help!

PS: I used the whole file path in my first post, I just wanted to simply it, so I removed the long path.

joey711 commented 7 years ago

This problem is because there is extra information being embedded in the field in the tree file that should just be the identity of the leaves, also called the "taxa name" in phyloseq. When you combine one data type with others in phyloseq, it takes the intersection of the IDs among them along the same dimension (samples or taxa). I can't see any reason you need the ;size=3672; information at all at this stage in the analysis. The counts should already be stored in your biom file, or anything else that represents an sequence/OTU-by-sample table.

I've probably done this before elsewhere in this forum, but you can write a regular expression that will remove everything from the first semi-colon to the end of the string for each of those tip names in the tree. I assume that is the difference between the tip IDs in this tree and the taxa names in your biom file. You'll have to check.

Some relevant commands

gsub
taxa_names
taxa_names() <-
read_tree
merge_phyloseq
phyloseq

Fixing the problem with the tree tip names can be done in R/phyloseq via taxa_names(). Once you've done this, the tree can be joined/added to the rest of your phyloseq object using phyloseq() or merge_phyloseq().

I'll close for now, but re-open if something else crops up.