Closed kmikkels closed 10 years ago
I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5
is the latest, there is no 13_8
).
Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom
file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.
Hi,
Thanks for looking at this so fast. I am using (macqiime) QIIME 1.8.0 and it includes an upgrade to GG reference OTUs 13_8, so it's possible that this is a really new version?
I had attached my .biom file but it appears to be too big an email for your server to accept do you have dropbox or something along those lines so I can send it to you?
~Kristin
On Tue, Apr 29, 2014 at 11:57 AM, Paul J. McMurdie <notifications@github.com
wrote:
I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5 is the latest, there is no 13_8).
Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.
— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41709669 .
Let's see if it works compressed. I think this should go through.
Thanks again for all your help! ~Kristin
On Tue, Apr 29, 2014 at 12:14 PM, Kristin Mikkelson kmikkelson55@gmail.comwrote:
Hi,
Thanks for looking at this so fast. I am using (macqiime) QIIME 1.8.0 and it includes an upgrade to GG reference OTUs 13_8, so it's possible that this is a really new version?
I had attached my .biom file but it appears to be too big an email for your server to accept do you have dropbox or something along those lines so I can send it to you?
~Kristin
On Tue, Apr 29, 2014 at 11:57 AM, Paul J. McMurdie < notifications@github.com> wrote:
I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5 is the latest, there is no 13_8).
Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.
— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41709669 .
I am using QIIME 1.8 with gg_13_8 and phyloseq 1.8.1 with this work flow:
biom_file = "sort_no_ooze_otu_table.biom" #This is your .biom file map_file = "lavabeds_mapping.csv" #This is your mapping file with all the metadata tree_file = "rep_set_tree.tre" #This is the tree built after assinging all taxonomy
tree <-read_tree(tree_file) map <- import_qiime_sample_data(map_file)
parashant <- import_biom(biom_file,tree_file,parseFunction=parse_taxonomy_greengenes) warnings(parashant)
intersect(parashant) parashant <- merge_phyloseq(parashant,map)
parashant
ntaxa(parashant) sample_names(parashant) rank_names(parashant) sample_variables(parashant) otu_table(parashant)[1:10, 1:5] tax_table(parashant)[1:10, 1:5]
get_taxa_unique(parashant, "Phylum")
And all the taxa ranks are pulled in. Not sure if any of that helps!
Kristin,
You need to check the prefixes (if any) and general structure of the taxonomy in your file. biom-format files are fairly human-readable so this is not a difficult task. If the taxonomy in your file differs from the norm for greengenes, you should explain how and why so we can discuss whether anything needs to be done, or if it is a QIIME issue. It is still very unclear from your description.
I should point out that it is shady that QIIME has a more recent version of GG included than is available at the "official" public repository: http://greengenes.secondgenome.com/downloads
I'm hoping someone from that team can comment. Would be nice to know the story there.
That is very strange that QIIME has incorporated a more recent version of GG than is available on the official GG site. Not sure what is going on there.
As for the structure of my .biom file it is a matrix of numbers (ie. [108,14,1] that go all the way to [144424,14,1]) followed by: "rows": [{"id": "denovo84068", "metadata": {"taxonomy": ["kBacteria", "pCyanobacteria", "cChloroplast", "oStramenopiles", "f", "g", "s"]}},{"id": "denovo84069", "metadata": {"taxonomy": ["Unassigned"]}},{"id": "denovo84066", "metadata": {"taxonomy": ["kBacteria", "pAcidobacteria", "cSolibacteres",
This goes on for quite a long time and continues to list the different taxonomy which I do believe is representative of gg classifications/prefixes.
On Tue, Apr 29, 2014 at 2:47 PM, Paul J. McMurdie notifications@github.comwrote:
Kristin,
You need to check the prefixes (if any) and general structure of the taxonomy in your file. biom-format files are fairly human-readable so this is not a difficult task. If the taxonomy in your file differs from the norm for greengenes, you should explain how and why so we can discuss whether anything needs to be done, or if it is a QIIME issue. It is still very unclear from your description.
I should point out that it is shady that QIIME has a more recent version of GG included than is available at the "official" public repository: http://greengenes.secondgenome.com/downloads
I'm hoping someone from that team can comment. Would be nice to know the story there.
— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41729995 .
Kristen, email attachments do not get sent through the GitHub system, as far as I have seen. You will need to post it to a web-accessible link. I actually recommend creating or finding a very small (minimal) version of this data the reproduces the same problem. Then you don't have to share your original data, and it will be easier for all of us to locate the issue.
Nothing in the header you posted jumps out at me as a problem, or even being any different than previous versions of greengenes that have imported by phyloseq just fine.
I actually followed the workflow suggested by bioinfonm above and merged my tree, map and biom files. This seems to have worked quite well and I am now able to read all the taxonomic ranks in my data.
Thanks!
On Mon, May 5, 2014 at 6:18 PM, Paul J. McMurdie notifications@github.comwrote:
Kristen, email attachments do not get sent through the GitHub system, as far as I have seen. You will need to post it to a web-accessible link. I actually recommend creating or finding a very small (minimal) version of this data the reproduces the same problem. Then you don't have to share your original data, and it will be easier for all of us to locate the issue.
Nothing in the header you posted jumps out at me as a problem, or even being any different than previous versions of greengenes that have imported by phyloseq just fine.
— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-42256081 .
Kristen,
I'm glad that worked, but it doesn't explain what your original problem was. The import_biom
function doesn't parse the taxonomy differently just because you provided a tree. It also can take the map as a direct argument, helping you avoid the extra merge_phyloseq
step.
I will close this issue for now because it sounds like you've solved your problem (and that the problem was not phyloseq's). However, it might be helpful to other users if you post here what went wrong in your first attempt.
Thanks for the feedback!
I have successfully imported my .biom file from QIIME but it appears that there are only two taxa that are recognized. When I was creating my OTU file in QIIME I used the newest version of greengenes (
13_8
) that maybe has different prefixes? Here is my script below:R version 3.1.0 (2014-04-10) -- "Spring Dance" library("phyloseq") packageVersion("phyloseq") [1] ‘1.8.1’ library("ggplot2") packageVersion("ggplot2") [1] ‘0.9.3.1’ theme_set(theme_bw())
otu_file = ("/users/kristin/Documents/Pine_Beetle_Research/Microbio/Phyloseq/sorted_otu_table.biom") OTU_table = import_biom(otu_file, parseFunction = parse_taxonomy_greengenes) print(OTU_table) phyloseq-class experiment-level object otu_table() OTU Table: [ 144428 taxa and 43 samples ] tax_table() Taxonomy Table: [ 144428 taxa by 2 taxonomic ranks ]
rank_names(OTU_table) [1] "Kingdom" "Rank1"
get_taxa_unique(OTU_table, "phylum") Error in tax_table(as(x, "matrix")[i, j, drop = FALSE]) : error in evaluating the argument 'object' in selecting a method for function 'tax_table': Error in as(x, "matrix")[i, j, drop = FALSE] : subscript out of bounds
get_taxa_unique(OTU_table, "Kingdom") [1] "Bacteria" NA "Archaea"
gpt = subset_taxa(OTU_table, Kingdom == 'Bacteria')
plot_bar(gpt, fill = 'Genus') Error in eval(expr, envir, enclos) : object 'Genus' not found
Maybe I used the wrong parse function when I imported my .biom file? When I use this same .biom file in QIIME I am able to get the taxonomy bar graphs for all levels, so I know the information is at least there in the .biom file.
Thanks for any help! Kristin