grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

Phyloseq heat tree doesn't see taxon_id or supertaxon_id when trying filter_taxa? #359

Open hank00000 opened 8 months ago

hank00000 commented 8 months ago

Apologies if this has been discussed before or is an easy fix!

I've just been introduced to metacoder and I want to be able to use a phyloseq object as an entry. I can get the code to work pretty well, but as mentioned in the documentation, a pretty large and hard to read heat tree is produced. One suggestion is to filter_taxa by a taxon rank to cut it down a bit, but my object (sourced from standard physique object) won't work to filter by supertaxon. It throws the following error:

Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids. 3: In heat_tree.default(taxon_id = character(0), supertaxon_id = character(0), : 'taxon_id' and 'supertaxon_id' are empty. Returning NULL.

Data set 3 is my sample data so it makes sense that there is no taxon id. Data set 4 is the physeq tree that comes with the physique object.

How can I tell the filter_taxa function of the heat map to look in Data set 1 (out_table) or Data set 2 (taxa_data) for this? I know it's because I'm inputting a physeq, but it's much easier for me to do so in my workflow.

As is, I only have it as one line in the heat_tree code as so:

obj %>% + filter_taxa(taxon_ranks == "p") %>% + heat_tree(node_label = taxon_names, + node_size = n_obs, + node_color = n_obs, + initial_layout = "re", layout = "da")

Thank you!!

zachary-foster commented 8 months ago

These two warngings are to be expected for the reasons you say:

Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 
2: The data set "4" is named, but not named by taxon ids. 

Just a result of trying to force all the data in the phyloseq object in the taxmap object.

I think the issue might be that ranks are not named "p", by maybe something like "Phylum" in your data set? What does taxon_ranks(obj) return? If you used the wrong rank format and caused filter to remove all taxa as a result, you can get that error:

3: In heat_tree.default(taxon_id = character(0), supertaxon_id = character(0),  :
  'taxon_id' and 'supertaxon_id' are empty. Returning NULL.

I really should add a better error message for this case.

Take a look at obj %>% filter_taxa(taxon_ranks == "p") and see if any taxa are left.

hank00000 commented 8 months ago

Yes, there are "no taxa" according to the code you suggested, but these rank names as I used qiime2 to assign taxonomy (but then input the files into R to make graphs with):

dBacteria;pProteobacteria;cGammaproteobacteria;oBurkholderiales;fComamonadaceae;gRhodoferax

However, I'm able to get heat trees made using the code as long as I don't ask it to filter taxa. So that's what's confusing me, I expected it to either not work or fully work, not half-work, haha. I'll attach a pic of what I was able to get from this dataset. 341f_primerset.pdf

zachary-foster commented 8 months ago

I am guessing that there is no taxon rank information in the taxmap object. If you don't see rank names in taxon_ranks(obj) then that information is not available to be filtered by. Try parsing the phyloseq object with this code:

parse_phyloseq(my_phyloseq_obj, class_regex = "(^[a-z]+)__(.+$)", class_key = c("taxon_rank", "taxon_name"))

This, or something like it, should make the rank information available and then the filter should work.

hank00000 commented 8 months ago

Thanks for your help Zachary

When I do that, I get this error:

Error in parse_tax_data(tax_data = tax_data, datasets = datasets, class_cols = tax_cols, : "named_by_rank = TRUE" does not make sense when "taxon_rank" is in "class_key".

zachary-foster commented 8 months ago

What does taxon_ranks(obj) return with your original code?

hank00000 commented 8 months ago

head(taxon_ranks(obj)) aab aac aae aaf aag aah "Kingdom" "Kingdom" "Phylum" "Phylum" "Phylum" "Phylum" It goes all the way to genus which matches my data set.

When I run it with the standard settings for parse_phyloseq

obj <- parse_phyloseq(phyR, class_regex = "(.*)", class_key = "taxon_name")

zachary-foster commented 8 months ago

Ok, so it should be parsing the rank information correctly. What does this return?

obj %>% filter_taxa(taxon_ranks == "Phylum")
hank00000 commented 8 months ago

You got it!! I was filtering based on 'p' because that's what the original data had, but if I filter by "Phylum" it works. Thank you so much for your help.

PS: The same errors are thrown about Dataset 3/Dataset 4, but the plot does filter by Phylum. Warning messages: 1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 2: The data set "4" is named, but not named by taxon ids.

obj <- parse_phyloseq(phyR, class_regex = "(.*)", class_key = "taxon_name") set.seed(2) # Each number will produce a slightly different result for some layouts obj %>% filter_taxa(taxon_ranks == "Phylum") %>% # subset to the class rank heat_tree(node_label = taxon_names, node_size = n_obs, node_color = n_obs, initial_layout = "re", layout = "da")

output_file = "plot_example.pdf")

zachary-foster commented 8 months ago

No problem!