grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
135 stars 28 forks source link

NAs found in the "node_color" option. #307

Closed mr2raccoon closed 3 years ago

mr2raccoon commented 3 years ago

First of all thanks for this amazing package @zachary-foster

I really enjoy the type of vizualization that is possible with this.

I am still having trouble though. I hope you can help.

My goal is to draw two individual heat trees for the two samples present in my object.

relobj.RData.zip

I managed more or less to draw the first one now showing all species with more than 0.01% of reads (I choose this value to make the heat tree more readable).

Rplot11.pdf

I have the percent of reads mapped onto the node_size, so it shows the species and their relative abundance nicely in the leafes. So far I am satifisfied.

The problem is though that I also would like to have the the higher taxonomic ranks displayed in their relative abundance (so that I am able to show that relatively the abundance of „Ascomycota“ is higher than „Basidiomycota“ etc.

The heat_tree though only draws if I mapp „n_obs“ onto „edge_color“ , „node_size“, respectively.

relobj <- parse_phyloseq(physeq_rel)  #### parsin phyloseq-object
relobjfil <- filter_obs(relobj, "otu_table", MB5_2020_nano > 0.01, drop_taxa = TRUE) ##### Filter for Sample MB5 and taxa with rel. abundance >0.01%
relobjfil$data$tax_abund <- calc_taxon_abund(relobjfil, "otu_table", groups=sample_data$sample_id)
relobjfil%>%
  taxa::filter_taxa(taxon_names == "Fungi", subtaxa = TRUE) %>% # subset of Fungi
  heat_tree( node_label = taxon_names,
             node_color = MB5_2020_nano,
             node_size = n_obs,
             node_size_range = c(0.015, 0.04),
             edge_color= n_obs, 
             title= "Species composition Nanopore Sample MB5 for Species > 0.01 %",
             title_size = 0.03,
             node_color_axis_label= "Read in Percent % ",
             edge_color_axis_label="Obs. per rank", 
             layout= "kamada-kawai")

When I plot the tree I get the following warning message:

Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 
2: NAs found in the "node_color" option. These may cause plotting errors or other strange behavior. NAs can be created if there is not a value for each taxon. The following 149 of 208 taxa have NAs for the "node_color" option:
  aaf, aal, aam, aao, aap, abn, abo, abp, abq, abr, abs, abt, abu, abv, abw ... atn, ato, atp, atq, atr, att, atu, atv, atw, atx, aty, aua, aub, aud

If I interpret this correctly it could not calculate the relative abundance for the higher ranks before, right?

I tried summing the per tax counts as shown in your workshop. When I try to execute the command

relobjfil$data$tax_abund <- calc_taxon_abund(relobjfil, "otu_table", groups=sample_data$sample_id)

I get the following error message.

Error in sample_data$sample_no : object of type 'closure' is not subsettable

I hope I could more or less clrearly communicate my problem and my goal.

Thanks a lot for your time and help!

zachary-foster commented 3 years ago

Thanks!

The issue here is that after you create the 'tax_abund' table, there are two tables with a column named "MB5_2020_nano", so heat_tree does not know which to use and uses the first one, which is in the OTU table and does not have data for the higher taxa, which is why you get the NA warning. Normally, you could do node_color = relobjfil$data$tax_abund$MB5_2020_nano to say which table to use, but you filtered the dataset right before with taxa::filter_taxa(taxon_names == "Fungi", subtaxa = TRUE), so the relobjfil$data$tax_abund$MB5_2020_nano would have a different set of taxa than the filtered version passed to heat_tree by %>%. To get around this, I renamed the columns in the "tax_abund" in the example below. There are other ways to get around this, but this is the easiest in this scenario.

library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.4.9002 (development version)
load('~/Downloads/relobj.RData')
relobjfil <- filter_obs(relobj, "otu_table", MB5_2020_nano > 0.01, drop_taxa = TRUE) ##### Filter for Sample MB5 and taxa with rel. abundance >0.01%
relobjfil$data$tax_abund <- calc_taxon_abund(relobjfil, "otu_table", cols=relobjfil$data$sample_data$sample_id, out_names = paste0(relobjfil$data$sample_data$sample_id, '_tax_abund'))
#> Summing per-taxon counts from 2 columns for 248 taxa
relobjfil%>%
  taxa::filter_taxa(taxon_names == "Fungi", subtaxa = TRUE) %>% # subset of Fungi
  heat_tree( node_label = taxon_names,
             node_color = MB5_2020_nano_tax_abund,
             node_size = n_obs,
             node_size_range = c(0.015, 0.04),
             edge_color= n_obs, 
             title= "Species composition Nanopore Sample MB5 for Species > 0.01 %",
             title_size = 0.03,
             node_color_axis_label= "Read in Percent % ",
             edge_color_axis_label="Obs. per rank", 
             layout= "kamada-kawai")
#> Warning: There is no "taxon_id" column in the data set "3", so there are no
#> taxon IDs.

Created on 2021-04-12 by the reprex package (v0.3.0)

I hope this helps!

mr2raccoon commented 3 years ago

Wow thanks a lot! I was able to make some amazing plots today!

mr2raccoon commented 3 years ago

One more question:

today I wanted to draw the heat trees for the rare taxa (<= 0.01%) in each sample. When I changed the operator MB6_2020_ill <= 0.01 I get more taxa that should be left in the column. Is there an explanation? Is reversing the operator the wrong approach?

zachary-foster commented 3 years ago

That should work. The higher taxonomic ranks (e.g., Fungi) will be present in both datasets, so the number of taxa in the > 0.01 plot and the number of taxa in the <= 0.01 plot will add up to something larger that your total dataset of 1069 taxa, since taxa like fungi are present in both datasets. They are included by default, since they are part of the classification of taxa that passed the <= 0.01 filter even though their abundance may be greater than 0.01. The supertaxa option of filter_obs controls if the higher taxa of lower taxa that pass the filter are included. Not including them would probably make the plot a mess.

mr2raccoon commented 3 years ago

The plot would then only include leaves, correct? It already displayed nicely yesterday when I tried but unfortunately it also shows all the species in the leaves with abundance=0. I tried to specify the range like this 0 < MB5_2020_ill <= 0.01 and some other ways. But nothing works. How can I specify a range inside this call?

Thanks so much again!

zachary-foster commented 3 years ago

Yea, only leaves and supertaxa that passed the filter.

It would be nice if R understood double comparisons like 0 < MB5_2020_ill <= 0.01, but it does not. You would need MB5_2020_ill > 0 & MB5_2020_ill <= 0.01

No problem!

mr2raccoon commented 3 years ago

Thanks so much! I learn a lot every day!