joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
581 stars 187 forks source link

Only two taxonomic ranks present in data imported from .biom file #336

Closed kmikkels closed 10 years ago

kmikkels commented 10 years ago

I have successfully imported my .biom file from QIIME but it appears that there are only two taxa that are recognized. When I was creating my OTU file in QIIME I used the newest version of greengenes (13_8) that maybe has different prefixes? Here is my script below:

R version 3.1.0 (2014-04-10) -- "Spring Dance" library("phyloseq") packageVersion("phyloseq") [1] ‘1.8.1’ library("ggplot2") packageVersion("ggplot2") [1] ‘0.9.3.1’ theme_set(theme_bw())

otu_file = ("/users/kristin/Documents/Pine_Beetle_Research/Microbio/Phyloseq/sorted_otu_table.biom") OTU_table = import_biom(otu_file, parseFunction = parse_taxonomy_greengenes) print(OTU_table) phyloseq-class experiment-level object otu_table() OTU Table: [ 144428 taxa and 43 samples ] tax_table() Taxonomy Table: [ 144428 taxa by 2 taxonomic ranks ]

rank_names(OTU_table) [1] "Kingdom" "Rank1"

get_taxa_unique(OTU_table, "phylum") Error in tax_table(as(x, "matrix")[i, j, drop = FALSE]) : error in evaluating the argument 'object' in selecting a method for function 'tax_table': Error in as(x, "matrix")[i, j, drop = FALSE] : subscript out of bounds

get_taxa_unique(OTU_table, "Kingdom") [1] "Bacteria" NA "Archaea"

gpt = subset_taxa(OTU_table, Kingdom == 'Bacteria')

plot_heatmap(gpt, sample.label = 'Sampletype') Error in get_variable(physeq, sample.label) : Your phyloseq data object does not have a sample-data component Try ?sample_data for more details. In addition: Warning messages: 1: In metaMDS(veganifyOTU(physeq), distance, ...) : Stress is (nearly) zero - you may have insufficient data 2: In postMDS(out$points, dis, plot = max(0, plot - 1), ...) : skipping half-change scaling: too few points below threshold

plot_bar(gpt, fill = 'Genus') Error in eval(expr, envir, enclos) : object 'Genus' not found

Maybe I used the wrong parse function when I imported my .biom file? When I use this same .biom file in QIIME I am able to get the taxonomy bar graphs for all levels, so I know the information is at least there in the .biom file.

Thanks for any help! Kristin

joey711 commented 10 years ago

I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5 is the latest, there is no 13_8).

Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.

kmikkels commented 10 years ago

Hi,

Thanks for looking at this so fast. I am using (macqiime) QIIME 1.8.0 and it includes an upgrade to GG reference OTUs 13_8, so it's possible that this is a really new version?

I had attached my .biom file but it appears to be too big an email for your server to accept do you have dropbox or something along those lines so I can send it to you?

~Kristin

On Tue, Apr 29, 2014 at 11:57 AM, Paul J. McMurdie <notifications@github.com

wrote:

I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5 is the latest, there is no 13_8).

Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.

— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41709669 .

kmikkels commented 10 years ago

Let's see if it works compressed. I think this should go through.

Thanks again for all your help! ~Kristin

On Tue, Apr 29, 2014 at 12:14 PM, Kristin Mikkelson kmikkelson55@gmail.comwrote:

Hi,

Thanks for looking at this so fast. I am using (macqiime) QIIME 1.8.0 and it includes an upgrade to GG reference OTUs 13_8, so it's possible that this is a really new version?

I had attached my .biom file but it appears to be too big an email for your server to accept do you have dropbox or something along those lines so I can send it to you?

~Kristin

On Tue, Apr 29, 2014 at 11:57 AM, Paul J. McMurdie < notifications@github.com> wrote:

I updated your question to be more precise (taxonomic ranks, not taxa; and greengenes 13_5 is the latest, there is no 13_8).

Meanwhile, I would need to see an example of your data, especially the form of the taxonomy in the .biom file. You also did not provide the version of QIIME that you used. Sadly, they tend to be a moving target with file formatting.

— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41709669 .

anmwinter commented 10 years ago

I am using QIIME 1.8 with gg_13_8 and phyloseq 1.8.1 with this work flow:

biom_file = "sort_no_ooze_otu_table.biom" #This is your .biom file map_file = "lavabeds_mapping.csv" #This is your mapping file with all the metadata tree_file = "rep_set_tree.tre" #This is the tree built after assinging all taxonomy

tree <-read_tree(tree_file) map <- import_qiime_sample_data(map_file)

parashant <- import_biom(biom_file,tree_file,parseFunction=parse_taxonomy_greengenes) warnings(parashant)

intersect(parashant) parashant <- merge_phyloseq(parashant,map)

parashant

ntaxa(parashant) sample_names(parashant) rank_names(parashant) sample_variables(parashant) otu_table(parashant)[1:10, 1:5] tax_table(parashant)[1:10, 1:5]

get_taxa_unique(parashant, "Phylum")

And all the taxa ranks are pulled in. Not sure if any of that helps!

joey711 commented 10 years ago

Kristin,

You need to check the prefixes (if any) and general structure of the taxonomy in your file. biom-format files are fairly human-readable so this is not a difficult task. If the taxonomy in your file differs from the norm for greengenes, you should explain how and why so we can discuss whether anything needs to be done, or if it is a QIIME issue. It is still very unclear from your description.

I should point out that it is shady that QIIME has a more recent version of GG included than is available at the "official" public repository: http://greengenes.secondgenome.com/downloads

I'm hoping someone from that team can comment. Would be nice to know the story there.

kmikkels commented 10 years ago

That is very strange that QIIME has incorporated a more recent version of GG than is available on the official GG site. Not sure what is going on there.

As for the structure of my .biom file it is a matrix of numbers (ie. [108,14,1] that go all the way to [144424,14,1]) followed by: "rows": [{"id": "denovo84068", "metadata": {"taxonomy": ["kBacteria", "pCyanobacteria", "cChloroplast", "oStramenopiles", "f", "g", "s"]}},{"id": "denovo84069", "metadata": {"taxonomy": ["Unassigned"]}},{"id": "denovo84066", "metadata": {"taxonomy": ["kBacteria", "pAcidobacteria", "cSolibacteres",

This goes on for quite a long time and continues to list the different taxonomy which I do believe is representative of gg classifications/prefixes.

On Tue, Apr 29, 2014 at 2:47 PM, Paul J. McMurdie notifications@github.comwrote:

Kristin,

You need to check the prefixes (if any) and general structure of the taxonomy in your file. biom-format files are fairly human-readable so this is not a difficult task. If the taxonomy in your file differs from the norm for greengenes, you should explain how and why so we can discuss whether anything needs to be done, or if it is a QIIME issue. It is still very unclear from your description.

I should point out that it is shady that QIIME has a more recent version of GG included than is available at the "official" public repository: http://greengenes.secondgenome.com/downloads

I'm hoping someone from that team can comment. Would be nice to know the story there.

— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-41729995 .

joey711 commented 10 years ago

Kristen, email attachments do not get sent through the GitHub system, as far as I have seen. You will need to post it to a web-accessible link. I actually recommend creating or finding a very small (minimal) version of this data the reproduces the same problem. Then you don't have to share your original data, and it will be easier for all of us to locate the issue.

Nothing in the header you posted jumps out at me as a problem, or even being any different than previous versions of greengenes that have imported by phyloseq just fine.

kmikkels commented 10 years ago

I actually followed the workflow suggested by bioinfonm above and merged my tree, map and biom files. This seems to have worked quite well and I am now able to read all the taxonomic ranks in my data.

Thanks!

On Mon, May 5, 2014 at 6:18 PM, Paul J. McMurdie notifications@github.comwrote:

Kristen, email attachments do not get sent through the GitHub system, as far as I have seen. You will need to post it to a web-accessible link. I actually recommend creating or finding a very small (minimal) version of this data the reproduces the same problem. Then you don't have to share your original data, and it will be easier for all of us to locate the issue.

Nothing in the header you posted jumps out at me as a problem, or even being any different than previous versions of greengenes that have imported by phyloseq just fine.

— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/336#issuecomment-42256081 .

joey711 commented 10 years ago

Kristen,

I'm glad that worked, but it doesn't explain what your original problem was. The import_biom function doesn't parse the taxonomy differently just because you provided a tree. It also can take the map as a direct argument, helping you avoid the extra merge_phyloseq step.

I will close this issue for now because it sounds like you've solved your problem (and that the problem was not phyloseq's). However, it might be helpful to other users if you post here what went wrong in your first attempt.

Thanks for the feedback!