grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
135 stars 28 forks source link

phyloseq filtering errors #243

Open raw937 opened 6 years ago

raw937 commented 6 years ago

y <- parse_phyloseq(fil_nifH) Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids.

y

5163 taxa: aab. k__bacteria, aac. k__archaea ... hqp. s__borrelia spielmanii 5163 edges: NA->aab, NA->aac, aab->aad, aab->aae ... cre->hqm, bjf->hqn, blu->hqo, bwl->hqp 4 data sets: otu_table: # A tibble: 42,559 x 43 taxon_id otu_id JZ03601 JZ03602 JZ03603 JZ03701 JZ03702 JZ03703 JZ03801 JZ03802 JZ03803 JZ03901 1 csu OTU437… 0 1 0 0 0 0 1 0 0 0 2 csv OTU317… 0 0 0 0 2 0 0 0 0 0 3 csw OTU428… 0 0 0 2 0 0 0 0 0 0 # ... with 4.256e+04 more rows, and 31 more variables: JZ03902 , JZ03903 , # JZ03904 , JZ04001 , JZ04002 , JZ04003 , JZ04101 , JZ04102 , # JZ04103 , JZ04104 , … tax_data: # A tibble: 42,559 x 8 taxon_id kingdom phylum class order family genus species 1 csu k__bacteria p__proteobacteria c__gammaproteobacteria o__ente… f__ente… g__en… s__enter… 2 csv k__bacteria p__proteobacteria c__gammaproteobacteria o__ente… f__ente… g__kl… s__klebs… 3 csw k__bacteria p__actinobacteria c__actinobacteria o__acti… f__acti… g__th… s__therm… # ... with 4.256e+04 more rows sam_data: # A tibble: 41 x 42 sample_ids otu_id Chemical_Sample… Site ARC Season Sample_dates Yield Pea_Color Height Plot 1 JZ03601 JZ03601 25 Hunt… Sout… Summer 08/05/16 877 Yellow 41 dryl… 2 JZ03602 JZ03602 25 Hunt… Sout… Summer 08/05/16 877 Yellow 41 dryl… 3 JZ03603 JZ03603 25 Hunt… Sout… Summer 08/05/16 877 Yellow 41 dryl… # ... with 38 more rows, and 31 more variables: sample_depth , AG_station , # Variety , Organic_Matter , Moisture_Content , Nitrate_Nitrite , # Ammonia , Av_Phosphorus , Av_Potassium , Sulfate_Sulfur , … phylo_tree: Phylogenetic tree with 42559 tips and 42558 internal nodes. Tip labels: OTU43775, OTU31754, OTU42881, OTU23254, OTU36873, OTU51405, ... Rooted; includes branch lengths. 0 functions: subsetted <- filter_taxa(y, n_supertaxa > 1) Error in access(physeq, "otu_table", TRUE) : otu_table slot is empty. heat_tree(filter_taxa(y, name == "Archaea", subtaxa = TRUE), + node_size = n_obs, node_label = name, + node_color = n_obs, layout = "fruchterman-reingold") Error in filter_taxa(y, name == "Archaea", subtaxa = TRUE) : unused argument (subtaxa = TRUE) unable to filter or subset. help?
zachary-foster commented 6 years ago

Hello @raw937, thanks for the report! I am traveling right now, so it might be a few days before I can look into this.

zachary-foster commented 6 years ago

Hello @raw937, I am sorry for the delay.

I think the problem is that filter_taxa is the name of a function in phyloseq as well as the name of a function in taxa, so you have to specify that you want the taxa function using taxa::filter_taxa(...).

We are trying to find a solution to this inconvenience.

These warning messages are normal I think:

Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs.
2: The data set "4" is named, but not named by taxon ids.

because not all the data in a phyloseq object is associated with taxa.

raw937 commented 6 years ago

No worries mate. So huh?

x <- parse_phyloseq(GlobalPatterns) Warning messages: 1: The following 17865 of 19216 input indexes have NA in their classifications: 1, 2, 4, 5, 6, 7, 8, 9, 10, 11 ... 19207, 19208, 19210, 19211, 19212, 19213, 19214, 19215, 19216 2: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 3: The data set "4" is named, but not named by taxon ids.

y <- parse_phyloseq(fil_nifH) Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids.

heat_tree(y, node_size = n_obs, edge_color = k__archaea, node_label = taxon_names, node_color = n_obs, node_color_range = c("cyan", "magenta", "green"), edge_color_range = c("#555555", "#EEEEEE"), initial_layout = "reingold-tilford", layout = "davidson-harel", overlap_avoidance = 0.5)

Error in eval(x$expr, data, x$env) : object 'k__archaea' not found

heat_tree(filter_taxa(y, name == "k__archaea", subtaxa = TRUE), node_size = n_obs, node_label = name, node_color = n_obs, layout = "fruchterman-reingold")

Error in filter_taxa(y, name == "k__archaea", subtaxa = TRUE) : unused argument (subtaxa = TRUE)

subsetted <- filter_taxa(y, n_supertaxa > 1) Error in access(physeq, "otu_table", TRUE) : otu_table slot is empty.

I can send input data if you would like?

zachary-foster commented 6 years ago

You need to replace filter_taxa with taxa::filter_taxa, or load metacoder after phyloseq. You are calling the phyloseq function filter_taxa on a taxmap object, which is causing the error . You need the filter_taxa from the taxa package, which is loaded when you load metacoder.

Also, name needs to be taxon_names

try:

heat_tree(taxa::filter_taxa(y, taxon_names == "k__archaea", subtaxa = TRUE),
          node_size = n_obs,
          node_label = taxon_names,
          node_color = n_obs,
          layout = "fruchterman-reingold")

The :: specifies which package a function comes from. It is only needed when there are multiple functions with the same name from different packages or you want to call a function without loading the package it is in.

raw937 commented 6 years ago

Thank you! That produced a tree = neat.

Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids. 3: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], : 'min' arg is deprecated. Use 'lower' instead. 4: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], : 'max' arg is deprecated. Use 'upper' instead.

How would I subset and build trees by sample type or a category in the metadata? Then plot the data in the same plot? I can subselect in phyloseq.

zachary-foster commented 6 years ago

No problem!

All those warnings can be ignored. First two are from the function that parse_phyloseq uses. I think I will suppress those in a future release, since they are expected most the time. Normally you dont need to put data in a taxmap object that is not classified by a taxonomy and that is what the warnings are complaining about, but I do it in this case to preserve all the info in phyloseq objects. The second two dont cause a problem and will go away in the next release. They are because a dependency changed recently.

How would I subset and build trees by sample type or a category in the metadata?

There are a few ways. It depends on what you want to plot. Read abundance? Number of OTUs? Number of samples with reads? Comparisons between groups of samples? Using the group option of functions like calc_taxon_abund and calc_n_samples is probably the easiest method. Look through this and let me know if you still have questions:

https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/04--manipulating.html https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/05--plotting.html

Also the examples in the help docs for the functions are useful. type ?calc_taxon_abund and run the examples at the bottom

Then plot the data in the same plot?

I am not sure what you mean. Multiple trees in the same plot or multiple statistics with the same tree?

raw937 commented 6 years ago

Again thank you! I will start here then if I get stuck I will message back.

zachary-foster commented 6 years ago

Closing for now. Let me know if you have other issues

raw937 commented 6 years ago

New error -

heat_tree(taxa::filter_taxa(y, taxon_names == "k__archaea", subtaxa = TRUE),

  • node_size = n_obs,
  • node_label = taxon_names,
  • node_color = n_obs,
  • layout = "fruchterman-reingold") NULL Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids. 3: In heat_tree.default(taxon_id = character(0), supertaxon_id = character(0), : 'taxon_id' and 'supertaxon_id' are empty. Returning NULL.
raw937 commented 6 years ago

heat_tree(taxa::filter_taxa(y, taxon_names == "k__archaea", subtaxa = TRUE), node_size = n_obs, node_label = taxon_names, node_color = n_obs, layout = "fruchterman-reingold")

zachary-foster commented 6 years ago

What does taxa::filter_taxa(y, taxon_names == "k__archaea", subtaxa = TRUE) return? I have a feeling there are no taxa left for some reason.

raw937 commented 6 years ago

Ah, your right nothing there! sorry :-(

Madegwa commented 4 years ago

Hello i created a phyloseq object using qiime2R and used parse_phyloseq command to use metacoder. When i try to create a heatmap i get a similar error

x <- parse_phyloseq(phy)

heat_tree(x, node_size = n_obs, node_color = n_obs, node_label = taxon_ids, tree_label = taxon_ids)

Error: The data set "data" is named, but not named by taxon ids.

Any help will be highly appreciated.

zachary-foster commented 4 years ago

Hi,

Can you send me the phyloseq object with saveRDS so I can see what is going on?

zachary-foster commented 4 years ago

Thanks!

It was a bug that only happens when you have an object with a table that has columns named by numbers. I fixed it, but you will have to reinstall the taxa package from Github:

devtools::install_github("ropensci/taxa")

After install, rerun your code and let me know if it is still a problem. Thanks

Madegwa commented 4 years ago

It worked! Thank you very much for your help.I am very grateful.

On Fri, 17 Apr 2020 at 10:09, Zachary Foster notifications@github.com wrote:

Thanks!

It was a bug that only happens when you have an object with a table that has columns named by numbers. I fixed it, but you will have to reinstall the taxa package from Github:

devtools::install_github("ropensci/taxa")

After install, rerun your code and let me know if it is still a problem. Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/243#issuecomment-614978347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALC5ZNKKDFFXYWEG74456EDRM6T3BANCNFSM4FM3ZJNA .