joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

import_biom metadata issue and ERROR: In parseFunction(i$metadata$taxonomy) ... #303

Closed anmwinter closed 10 years ago

anmwinter commented 10 years ago

Joey and Michelle,

Thanks again for all the suggestions!

I went through the examples with GlobalPatterns and everything turned out fine. I am using the import_biom function two things are coming up.

1) The import_biom isn't pulling in my metadata from the rich biom file in qiime. I created this using the following command in macqiime:

biom add-metadata -i otu_table.biom -o rich_otu_table.biom --sample-metadata-fp VLmapping.csv

I checked the biom file in sublime text to verify that the sample data attached correctly. So to fix this I imported the mapping file and then merged it. Not a big deal just an extra line of code.

2) I am getting errors from the import_biom function. I used warnings() and intersect() but I admit I am not sure what intersect is doing. As you can see from the code below I can still call the villaluz phyloseq object and everything shows up. I can still plot alpha diversity and so on. Are these errors going to cause me any trouble?

OS X 10.7.x R 3.x RStudio 0.9x

QIIME 1.8 Created both two biom files. One with metadata and one with out. Same error shows up. Mapping file is the standard mapping file Taxonomy is assigned using gg_13_8. Both biom files have the gg taxonomy in them (verify in sublime text)

require(ggplot2) Loading required package: ggplot2 library("ape") library("plyr") library("phyloseq"); packageVersion("phyloseq") [1] ‘1.7.20’

setwd("~/Desktop/PhD_Project/454_data/villa luz/") biom_file = "otu_table.biom" or ="rich_otu_table.biom" map_file = "VLmapping.csv"
tree_file = "rep_set_tree.tre"
tree <-read_tree(tree_file)

I think this is making an object from the mapping file

map <- import_qiime_sample_data(map_file)

Here we create the phyloseq object with the bio and tree file from QIIME

villaluz <- import_biom(biom_file,tree,parseFunction=parse_taxonomy_greengenes) There were 50 or more warnings (use warnings() to see the first 50)

warnings(villaluz) Warning messages: 1: In parseFunction(i$metadata$taxonomy) : No greengenes prefixes were found. Consider using parse_taxonomy_default() instead if true for all OTUs. Dummy ranks may be included among taxonomic ranks now. Error in cat(list(...), file, sep, fill, labels, append) : argument 2 (type 'S4') cannot be handled by 'cat'

intersect(villaluz) Error in as.vector(y) : argument "y" is missing, with no default

Merge the phyloseq objects so everything is in one object

villaluz <- merge_phyloseq(villaluz,map) villaluz phyloseq-class experiment-level object otu_table() OTU Table: [ 3594 taxa and 7 samples ] sample_data() Sample Data: [ 7 samples by 9 sample variables ] tax_table() Taxonomy Table: [ 3594 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 3594 tips and 3592 internal nodes ]

Downstream plot_richness and plot_bar by phyla works fine.

Thanks! ara

joey711 commented 10 years ago

Ara,

I don't have any way to reproduce the errors you're describing. There are lots of internal checks when phyloseq produces a new phyloseq data object, especially that OTU and sample indices match. You are ultimately responsible for checking that your data appears the way that it should, so you should check that the labels and sums of counts make sense. There are many functions in phyloseq to help you explore this, too many to list here. See the index of available functions in phyloseq. I typically start with things like

taxa_sums
sample_sums
variable_names
rank_names

But there are many others.

I'm sorry that the sample data that you added to the biom file was not deemed valid during import. This has been an ongoing issue with QIIME output to the biom format, and there's not much I can do about it. As far as I can tell there is no problem with the biom-format importer for R. Furthermore, since QIIME didn't include the sample data automatically, you might as well skip the step where you attempt to use a python script to add the sample data "after the fact". Just import the sample mapping file using phyloseq like you anyway did above.

The taxonomy warnings are from incomplete, missing, or wrong taxonomy entries in some of your data. The importer expects to find greengenes-formatted taxonomy entries, and complains if it doesn't. This doesn't mean there is anything wrong if you expect to have some missing entries. In some cases, people use the wrong parsing function to process the taxonomy entries, and so warnings of this kind during import are useful. For example, if the number of warnings equaled the number of OTUs, you would know for sure you had a problem.

It looks like you mainly wanted confirmation that you have done things "correctly". It looks fine to me so far.

joey

bayjan commented 9 years ago

This is old one, but I think my comment might be helpful for others. I had the same error when I imported biom file in phyloseq using the following command import_biom(BIOMfilename=BIOMfilename, treefilename = treefilename, parseFunction=parse_taxonomy_greengenes)
I got the following warning: There were 47 warnings (use warnings() to see them)
Then I checked biom file quickly with the following bash command:
sed -re 's/\{/\n{/g' otu_table_filtered.biom |grep 'taxonomy'|grep 'Unassigned'|wc -l
And the output of the previous bash command was 47. So, it must be those Unassigned ones that are causing the import problem.