grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
136 stars 28 forks source link

Creating a facet with multiple heat trees #263

Open juanu opened 5 years ago

juanu commented 5 years ago

Hi,

Thanks for this amazing package, I've been playing with it for a few days, and it will be really useful with my data :)

I have a question regarding the heat trees, that I could't find an answer elsewhere. I'm trying to visualize changes over time of my dataset, by doing individual heat trees of the different time points. The problem is that for each tree, the topology changes a little, which makes the comparisons difficult.

Is there a way to make a facet of multiple heat trees, using the timepoint variable for faceting? Or any other ideas/suggestions? Thanks!

zachary-foster commented 5 years ago

Hi @juanu, thanks!

Right now, there is no analog to the faceting concept in ggplot2, but you can make them all look the same by setting them same random seed before each plot using set.seed(some_number). See:

https://grunwaldlab.github.io/metacoder_documentation/faq.html#how-can-i-make-a-heat-tree-look-the-same-each-time

You can save multiple plots and combine them into one using packages like gridExtra:

library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.2 (stable). If you use metacoder for published research, please cite our paper:
#> 
#> Foster Z, Sharpton T and Grunwald N (2017). "Metacoder: An R package for visualization and manipulation of community taxonomic diversity data." PLOS Computational Biology, 13(2), pp. 1-15. doi: 10.1371/journal.pcbi.1005404
#> 
#> Enter `citation("metacoder")` for a BibTeX entry for this citation.
library(gridExtra)
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "info", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")
my_plots <- lapply(c(1,2, 3), function(time) {
  # do something with time point data
  set.seed(1) # Make each plot layout look the same
  heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, layout = "da")
})
grid.arrange(grobs = my_plots, nrow = 1)

Created on 2019-04-15 by the reprex package (v0.2.1)

See https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html for more info on combining plots.

juanu commented 5 years ago

Thanks @zachary-foster !! This definitely helps, in particular the use of gridExtra.

I'm still unsure on how I could create the individual plots for the grid, based on the sample metadata. What I'm trying to achieve is that each of the subplots, would have the same topology for the tree, but the count data will be different according to a specific timepoint. I've been playing with filter_obs, but it doesn't generate what I need.

Also, I keep getting this error (well, maybe I'm the one making a mistake :) ) with filter_obs. When I try to filter based on _sampledata, and use the option _droptaxa=TRUE , I keep getting this error message:

There is no "taxon_id" column in the data set "3", so there are no taxon IDs

I'm away from the computer, but I can try to generate a small example dataset. I started with a phyloseq object, that I imported using the parse_phyloseq command.

Thanks!

zachary-foster commented 5 years ago

No problem @juanu!

What I'm trying to achieve is that each of the subplots, would have the same topology for the tree,

To do this, all trees need to have the same set of taxa, so you cant filter taxa (with drop_taxa = TRUE), but modify the counts. Some taxa will have a count of 0 for some time points.

but the count data will be different according to a specific time point.

Yes, so you need a different set of counts for each time point, preferably as a column of counts for each time point in a table of per-taxon counts (one row per taxon). Once you have that, you plot the data for a different column of counts in each iteration of the loop generating the plot. You can generate these per-time-point columns by averaging/summing per-sample taxon counts using sample metadata information using calc_taxon_abund, possibly in combination with calc_group_mean, as is shown here:

https://grunwaldlab.github.io/metacoder_documentation/workshop--07--diversity_stats.html#plotting-taxon-abundance-with-heat-trees

I keep getting this error message:

that just means one of you data sets is not classified by taxa, which means it cant be filtered by taxon info. Its more of a warning than an error.

I can try to generate a small example dataset.

If you cant figure it out from the above link, then feel free to send me that and I can help you figure it out.

Make sure you have the latest version of metacoder installed from CRAN, since I recently fixed a bug that has a small chance of effecting this analysis, depending on how you do it.

juanu commented 5 years ago

Thanks! @zachary-foster ! So, you were right on point with the comment on drop_taxa=TRUE, and I should have thought about it too 🤦‍♂️ . Basically, when doing the transformation, the taxa that is not present will have counts of 0s...which means that it will be on the dataset, but it won't have a value.

Anyway, thanks for all your suggestions. I need to work on the details now, but I'm back on track with the plots! I have to play now with the details of the sampling groups (this is just a quick example) and the visual parameters (for example, use just one node legend for all the three plots, with all in the same scale).

Thanks for all the help and quick replies :)

combined_plots

zachary-foster commented 5 years ago

Nice!

You probably want to set node size/color range to be the same by node_size_range and node_color_range, so all the plots have the same scale and the trees are the same size. Then, once you are done and ready for publication, you can delete two of the three legends, since they will be the same.

taniafort commented 5 years ago

Hi,

Thanks for this package Zachary! I'm just starting to use it and there are many very useful functions in there! I'm having the same issue as Juanu but I'm not as skilled as he is.

Could either of you two provide a complete example, please?

Thanks so much in advance,

zachary-foster commented 5 years ago

Hi @taniafort,

Does the example on my first reply do what you need? If not, what other parts do you need?

taniafort commented 5 years ago

Hi Zachary,

Thanks for the quick reply! It's just the part where you wrote "do something with time point data" that I don't know how to code.

Thanks in advance and sorry for the silly question,

zachary-foster commented 5 years ago

Ahh ok, that would depend on how your data is formatted and what you want to plot. Typically, you would filter the data to the subset plotted in each graph. Its hard for me to give a concrete example without seeing how your data is formatted and knowing what you want to plot. If you email me your data and tell me what you want to plot I can probably point you in the right direction. zacharyfoster1989 gmail

taniafort commented 5 years ago

I would like to compare miseq and sanger sequencing of bacterial communities.

I have one sample for each type of sequencing and I'd like to plot the number of sequences assigned to the genus rank for the miseq sequencing sample on one plot and do the same on another plot for sanger sequencing (I just e-mailed you the data).

zachary-foster commented 5 years ago

Hi @taniafort,

In your case, you need to say which column you want to plot at each iteration of the loop. See the example below. Let me know if you have questions.

# Load libraries
library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.2.9001 (development version)
library(phyloseq)
#> 
#> Attaching package: 'phyloseq'
#> The following object is masked from 'package:taxa':
#> 
#>     filter_taxa
library(gridExtra)
library(ggplot2)
#> 
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:taxa':
#> 
#>     map_data

# Read in data
getwd()
#> [1] "/tmp/Rtmpaj3ecd/reprex15a6213115d4"
phylo_obj <- readRDS('~/Downloads/ps.miseq.sanger.RDS') # Change file path
tm_obj <- parse_phyloseq(phylo_obj)

# Count seqs per taxon
tm_obj$data$tax_counts <- calc_taxon_abund(tm_obj, data = 'otu_table')
#> No `cols` specified, so using all numeric columns:
#>    Sanger, Miseq
#> Summing per-taxon counts from 2 columns for 193 taxa

# Make plots
my_plots <- lapply(c('Sanger', 'Miseq'), function(type) {
  # do something with time point data
  set.seed(2) # Make each plot layout look the same if using another layout
  heat_tree(tm_obj,
            node_label = taxon_names,
            node_size = n_obs,
            node_color = tm_obj$data$tax_counts[[type]],
            node_size_axis_label = 'OTU count',
            node_color_axis_label = 'Sequence count',
            node_size_range = c(0.013, 0.04), # makes scale range the same in both
            title = type)
})

# Combine plots
combined <- grid.arrange(grobs = my_plots, nrow = 1)


# Save output
ggsave(combined, filename = 'sanger_vs_miseq.pdf', width = 7, height = 3)

Created on 2019-06-20 by the reprex package (v0.3.0)

taniafort commented 5 years ago

Thank you so much Zachary!! This is exactly what I wanted! :)

dylbaker commented 1 year ago

Hello, is there a way to "collect" legends and add tag labels (e.g. A-F) (via patchwork) for plots that are created this way? I have been able to manually edit the image in Adobe, but would like an option where I can convert the plot to a ggplot object and then arrange further the way I would like. I followed the example above to get the grid.arrange() object, but then I am stuck from there. The first image is what I originally got, and the second image is what I want (edited first image in adobe, but would like a reproducible solution) seed_vs_xenic_flipped_old seed_vs_xenic_flipped

zachary-foster commented 1 year ago

Not at the moment, but this has been something I have wanted to add for a while. I will try see what I can do.

dylbaker commented 1 year ago

Thanks!