grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
135 stars 28 forks source link

Phyloseq and metacoder #141

Closed wipperman closed 5 years ago

wipperman commented 7 years ago

Can you point me to a tutorial or accessor function for how to take an OTU table + metadata and convert this to a taxmap object, which can then be used to make plots? Is this possible with the package? I see that it is in the future directions in the paper, but am unable to figure it out myself (although am able to run all of the software and the examples!). Thanks so much for the help.

zachary-foster commented 7 years ago

Hi @wipperman! So you have a phyloseq object then? There is no function yet to handle this yet, but I can probably show you how to do it. I plan on adding this, but I am working on rewriting the taxmap class in the taxa repo, so a lot of upgrades are waiting on that. Once taxmap class is more flexible, it will make a lot of these compatibility issues easy to fix.

In the mean time, can you send me an example of how your data is formatted (an .Rdata file) or the code to produce it or a link to a phyloseq tutorial that produces an object with the same format? I can probably figure out how to do the conversion and perhaps add the function to the dev branch of metacoder. Thanks

wipperman commented 7 years ago

Sure! Here is some reproducible code that uses sample data from the Phyloseq package. The code will plot a tree on the left, with a differential abundance heatmap on the right, sorted by bodysite. I would like to use your package to show what genera (at any level of taxonomy, Order, Genus, etc) differ between just two of the groups (e.g., skin and soil). As for writing a wrapper function, it is this kind of question that I believe many users of Phyloseq would like to so (compare treatment vs control), so this may be a helpful example to incorporate since it uses data available through Phyloseq. Let me know if you have any other questions @zachary-foster and thanks so much for the help!

rm (list = ls()) library(phyloseq);library(ggstance);library(ggthemes);library(plyr);library(dplyr);library(ggtree);library(tibble)

set.seed(394582)

load sample Phyloseq dataset

data("GlobalPatterns") GP <- GlobalPatterns GP <- prune_taxa(taxa_sums(GP) > 500, GP) humantypes <- c("Feces", "Skin") #make it a bit simpler sample_data(GP)$human <- get_variable(GP, "SampleType") %in% humantypes

mergedGP <- merge_samples(GP, "SampleType") mergedGP <- rarefy_even_depth(mergedGP,rngseed=394582) mergedGP <- tax_glom(mergedGP,"Order") #make the tree smaller for the purposes of this example

print(mergedGP) #final phyloseq object

make a tree

tr <- phy_tree(mergedGP) spec <- tax_table(mergedGP) %>% data.frame(stringsAsFactors = FALSE) %>% tibble::rownames_to_column("otu") gt <- ggtree(tr, branch.length = "y") %<+% spec gd <- gt$data

melt_data<-psmelt(mergedGP)

tt <- melt_data %>%left_join(select(gd,OTU=label,x,y),by="OTU") %>% arrange(Sample) %>% mutate(sample2=factor(Sample,levels=unique(Sample)), col=as.numeric(sample2),x.col=scales::rescale(col,to=c(1.3,2)))

this will demonstrate that one can make a ggtree object with abundance data (other examples show this tpp)

g <- gt + geom_tippoint(data=gd$istip,aes(color=Phylum),size=2) + geom_text(data=gd$istip,aes(label=Genus,x=x+0.01,size='smaller[1]'),hjust=0.1,check_overlap = T)+ geom_tile(data=tt,aes(x=x.col,y=y,fill=Sample,alpha=Abundance)) + theme(legend.position="right") #+ geom_tiplab2() g

zachary-foster commented 7 years ago

Hi @wipperman, sorry about the delay. Last week was busy.

I have managed to make a less-than-elegant function to convert phyloseq objects to taxmap objects for use with metacoder. It will probably be in the dev version shortly. In the mean time, I have pasted it below and an example of how to use it. The code below assumes you have run the code you supplied above and that you are using the dev version of metacoder. This function will probably be rewritten in the next major release once the taxa package (where taxmap will live in the future) is on CRAN.

# install dev version of metacoder
devtools::install_github("grunwaldlab/metacoder@dev")
library(metacoder)

# First draft of conversion function 
phyloseq_to_taxmap <- function(phyloseq_obj) {
  tax_data <- data.frame(
    taxonomy = vapply(seq_len(nrow(phyloseq_obj@tax_table)), FUN.VALUE = character(1),
                      function(i) {
                        taxon_names <- as.vector(phyloseq_obj@tax_table[i, ])
                        rank_names <- colnames(phyloseq_obj@tax_table)
                        rank_names <- rank_names[!is.na(taxon_names)]
                        taxon_names <- taxon_names[!is.na(taxon_names)]
                        paste(rank_names, taxon_names, sep = "::", collapse = ";;")
                      }),
    phyloseq_id = rownames(phyloseq_obj@tax_table)
  )

  otu_data  <- as.data.frame(t(phyloseq_obj@otu_table))
  otu_data$phyloseq_id <- rownames(otu_data)
  tax_data <- merge(tax_data, otu_data, by.x = "phyloseq_id", by.y = "phyloseq_id")

  parse_taxonomy_table(tax_data, taxon_col = c("class" = 2), other_col_type = "obs_info",
                       class_regex = "^(.*)::(.*)$",
                       class_key = c(rank = "taxon_info", "name"),
                       class_sep = ";;")
}

# Convert to taxmap
data <- phyloseq_to_taxmap(mergedGP)

# Get proportions for each taxon from OTU counts
sample_names <- rownames(mergedGP@sam_data)
data$taxon_data[sample_names]  <- lapply(sample_names, function(col_name) {
  vapply(obs(data), function(indexes) sum(data$obs_data[indexes, col_name]) / sum(data$obs_data[, col_name]), numeric(1))
})

# Test plot of difference between ocean and soil
set.seed(2)
heat_tree(data,
          node_size = rowMeans(cbind(Ocean, Soil)),
          node_label = name,
          tree_label = name,
          node_color = Ocean - Soil,
          node_color_range = diverging_palette(),
          node_color_interval = c(-0.5, 0.5),
          edge_color_interval = c(-0.5, 0.5),
          node_label_max = 200,
          layout = "da", initial_layout = "re",
          title = "Ocean vs Soil",
          node_size_axis_label = "Average proportion")

rplot02

Feel free to ask any questions if something isn't clear.

wipperman commented 7 years ago

@zachary-foster thanks so much! I am able to reproduce it with my example. One oversight on my part is that OTU tables are usually structured by samples. In the example I gave, the work of merging samples by some variable (ocean, soil, etc) was done for you in the OTU table. A function that takes the OTU table and metadata, and merges many samples that fall into some category for a comparison (healthy/treatment, ocean/soil, etc) would be more helpful. It would add about one line of code, taking the metadata of the phyloseq object and merging samples based off of this metadata.

I can send another example as a .R file with some different data if you want to give it a try, or need other data to work with. I am not able to get this function to run on my own--I'm having issues merging the otu_table and tax_table using the merge function (it is deleting my data). I'll keep trying and let you know how it goes. But thanks for doing this--these trees will add a lot to the easy visualization of microbiome data comparisons.

zachary-foster commented 7 years ago

No problem @wipperman!

. A function that takes the OTU table and metadata, and merges many samples that fall into some category for a comparison (healthy/treatment, ocean/soil, etc) would be more helpful.

Yea, that is the data structure I am more used to seeing. One of the reasons I am making taxmap more flexible is to make encoding multiple samples per treatment easier to standardize so I can make functions that can interpret this information automatically. What I do currently is just have all samples for all treatments in one of the tables and a separate (i.e. not in the taxmap object) "metadata" table that has which samples are in which treatments. In the future, this metadata table will be in the taxmap object, like it is in phyloseq's object.

. I am not able to get this function to run on my own...

Im not sure what you mean, but I can help if you send me a part of your data.

Thanks

wipperman commented 7 years ago

Hey @zachary-foster

Here is a sample file where the sample data slot has "status" with two populations of people (healthy vs treatment). This is about as generic as one could get, and represents how people generally interact with phyloseq-class data. Github will only allow me to upload publically if its extension is .zip--just unarchive the file and substitute the phyloseq object like this:

phy <- readRSD("~/PATH/TO/example.phyloseq")

example.phyloseq.zip

Hope this helps, and let me know if you have any questions! --Matt

wipperman commented 7 years ago

Hey @zachary-foster Just wondering if you were able to view this/get it to load into R? Is there anything else I can help you with? Let me know if you have any other questions! --Matt

zachary-foster commented 7 years ago

Hi @wipperman,

Yea, thanks, I got it to load in R, but my draft parser does not work with it for some reason. I need to look into it more.

Is your work/research waiting on getting your data into metacoder from phyloseq? If so, I will prioritize getting it to work for you quickly. However, I will soon be reworking the parsing tools of metacoder to handle this kind of thing much better so any quick fixes now will likely be overwritten in the near future anyway.

The new version of taxmap that will be implemented in the taxa package should be able to hold all of the information on the phyloseq object easily. We plan to submit taxa to CRAN this month. After that, I will rewrite the file parsers / converters to use this new object and it will greatly simplify a lot of these issues, which is why I have not worked much in the past few weeks on the current parsers/converters.

Sorry if you are waiting on this for your work. If you need/want your data parsed soon, I can help you with your specific case and get it done quick likely. If you are more concerned with having an all-purpose phyloseq converter, that might be best put off until the taxa package is on CRAN in a few weeks.

wipperman commented 7 years ago

Hey @zachary-foster! Thank you for the help--it is not a big rush, I wanted to use the graphics in a talk that I am going to give at the end of the month, but if you give it a shot and are unable to get it to work, I'm able to wait and hold off until there is a phyloseq_to_taxmap-type function in your taxa package that I can use. I'll be sure to keep an eye out for it and adapt it. If you do manage to come up with a quick fix to your parser function in the meantime that would accommodate the datatype from the phyloseq object that I sent, please let me know. Thanks! Best, --Matt

zachary-foster commented 7 years ago

This is how I might parse phyloseq objects once the taxa package is integrated:

> parse_tax_data(as.data.frame(mergedGP@tax_table), as.data.frame(t(mergedGP@otu_table)), class_cols = 1:4, mappings = c("{{name}}" = "{{name}}"))
<Taxmap>
  212 taxa: 1. Archaea, 2. Bacteria, 3. Crenarchaeota ... 209. Gemellales, 210. Turicibacterales, 211. Lactobacillales, 212. Bacillales
  212 edges: NA->1, NA->2, 1->3, 1->4, 2->5, 2->6, 2->7, 2->8 ... 79->204, 79->205, 79->206, 79->207, 80->208, 81->209, 81->210, 81->211, 81->212
  1 data sets:
    [[1]]:
      # A tibble: 131 x 10
        taxon_id Feces Freshwater `Freshwater (creek)`  Mock Ocean `Sediment (estuary)`  Skin  Soil Tongue
           <chr> <dbl>      <dbl>                <dbl> <dbl> <dbl>                <dbl> <dbl> <dbl>  <dbl>
      1       82     4          1                    2    19   343                21166     9   278      8
      2       83     1          0                    1    14     1                  378   134  9069      1
      3       84     1         14                 2765     1    15                   10     0     0      2
      # ... with 128 more rows
  0 functions:

This is still being refined, but it seems to work in the dev version of taxa. @wipperman: I am mostly putting this here for my notes, but I thought you might be interested as well.

wipperman commented 7 years ago

Hey @zachary-foster, I am definitely still interested--let me download the dev version of taxa, and I'll give it a shot/see what I can do. If you have some sample code from the example phyloseq object I gave you, let me know. Additionally, if I get something to work, I'll post it on here. Thanks so much!

zachary-foster commented 7 years ago

Cool, thanks @wipperman! However, the functions in metacoder are not yet adapted to the new taxmap class defined in taxa. I am working on adapting them now. I would be happy to have your input on the taxa package, but the plotting done by metacoder will not work with this class yet fyi.

wipperman commented 7 years ago

Gotcha--no problem. I've already downloaded the dev version of taxa so I might as well try to learn what you're doing with it. When it's all ready let me know! Thank you @zachary-foster!

zachary-foster commented 7 years ago

Hi @wipperman, I have made a draft of the new phylseq object parser that uses the taxa package. I also updated the heat_tree function to use this object. There is still more work needed to get things stable enough for a release, but the fundamentals are there.

Previously you said:

A function that takes the OTU table and metadata, and merges many samples that fall into some category for a comparison (healthy/treatment, ocean/soil, etc) would be more helpful.

I agree, but I have not done that yet. Should not be hard though, assuming all phyloseq objects have sample data and otu tables in the same format.

You can try out what I am working on with the following code:

devtools::install_github("grunwaldlab/metacoder")
devtools::install_github("ropensci/taxa")

library(metacoder)
library(taxa)

phy <- readRDS("example.phyloseq")
obj <- parse_phyloseq(phy)

obj$filter_taxa(taxon_names == "Bacteria", subtaxa = TRUE)

healthy_samples <- obj$data$sam_data$sample_id[obj$data$sam_data$status == "healthy"]
healthy_counts <- obs_apply(obj, "otu_table", function(i) sum(obj$data$otu_table[i, healthy_samples]), simplify = TRUE)

obj %>%
  heat_tree(node_size = healthy_counts, 
            node_color = healthy_counts,
            node_label = taxon_names)

Let me know if you see any changes you would like to see to the object returned by parse_phyloseq.

wipperman commented 7 years ago

Hey @zachary-foster ...this looks great to me. I've attached the output from your code, and think it looks great!

One idea may be to allow for two continuous color scales to be plotted onto the same tree. So when comparing healthy vs treatment, the differences are more obvious (it's almost like a Circos plot but with an actually phylogenetic tree). Another suggestion is to put in the help section of the "parse_phyloseq" function this example, or something like it, so that people can understand how to go from parsing to plotting.

I'll play around with it some more, but at a first pass, this is exactly what I had in mind! Thanks so much!

healthy_taxmap

zachary-foster commented 7 years ago

Thanks for trying it out @wipperman! Im glad you like it so far

One idea may be to allow for two continuous color scales to be plotted onto the same tree.

That can kind of be done by setting the edge_color to something different than the node_color. I am also planning on making "piecharts" of each of the nodes when more than one value is given per taxon, so that might be an option in the future. You can also compare two samples by subtracting/dividing the values for each taxon, as was done in the ocean vs soil plot above.

Another suggestion is to put in the help section of the "parse_phyloseq" function this example, or something like it, so that people can understand how to go from parsing to plotting.

Good idea, I will do that.

wipperman commented 7 years ago

@zachary-foster So if you wanted to plot both healthy and treatment on the same tree using your above example, how would this be done? Something like this is what I have in mind. I fiddled with changing the edge_color, but although this gave two scales, I don't think it was right. Thanks!

healthy_samples <- obj$data$sam_data$sample_id[obj$data$sam_data$status == "healthy"] healthy_counts <- obs_apply(obj, "otu_table", function(i) sum(obj$data$otu_table[i, healthy_samples]), simplify = TRUE)

treat_samples <- obj$data$sam_data$sample_id[obj$data$sam_data$status == "treatment"] treat_counts <- obs_apply(obj, "otu_table", function(i) sum(obj$data$otu_table[i, treat_samples]), simplify = TRUE)

obj %>% heat_tree(node_size = c(treat_counts,healthy_counts), node_color = c(treat_counts,healthy_counts), node_label = taxon_names)

zachary-foster commented 7 years ago

Hi @wipperman, sorry for the delay! I was traveling when I noticed this issue and forgot to respond until now.

Here is how I typically compare two treatments. This code also uses three new functions: calc_obs_props, calc_taxon_abund, and compare_treatments to make this kind of thing easier. I think compare_treatments should address you previous suggestion:

A function that takes the OTU table and metadata, and merges many samples that fall into some category for a comparison (healthy/treatment, ocean/soil, etc) would be more helpful.

You will have to install the most recent dev version again.

devtools::install_github("grunwaldlab/metacoder")
devtools::install_github("ropensci/taxa")

library(metacoder)

phy <- readRDS("example.phyloseq")
obj <- parse_phyloseq(phy)

# Convert counts to proportions
obj$data$otu_table <- calc_obs_props(obj,
                                     dataset = "otu_table",
                                     cols = obj$data$sam_data$sample_ids)

# Calculate per-taxon proportions 
obj$data$tax_table <- calc_taxon_abund(obj, 
                                       dataset = "otu_table", 
                                       cols = obj$data$sam_data$sample_ids)

# Calculate difference between treatments
obj$data$diff_table <- compare_treatments(obj,
                                          dataset = "tax_table",
                                          sample_ids = obj$data$sam_data$sample_ids,
                                          treatments = obj$data$sam_data$status)

# Plot differneces
color_interval <- c(-5, 5) # The range of values (log 2 ratio of median proportion) to display
obj %>%
  taxa::filter_taxa(taxon_names == "Bacteria", subtaxa = TRUE) %>%
  heat_tree(node_size_axis_label = "Number of OTUs",
            node_size = n_obs,
            node_color_axis_label = "Log 2 ratio of median proportions",
            node_color = log2_median_ratio,
            node_color_range = diverging_palette(),
            node_color_trans = "linear",
            node_color_interval = color_interval,
            edge_color_interval = color_interval,
            node_label = taxon_names,
            node_label_max = 150)

rplot

Green == more abundant in healthy Brown == more abundant in treatment

wipperman commented 6 years ago

Hey @zachary-foster --this has been working great for me so far! One additional question that you may want to add to the tutorial page as well is how to determine which color corresponds to which "Treatment". So in the above example, Green is more abundant in healthy and Brown more abundant in Treatment, but for other comparisons, how is the color scheme determined? Is it alphabetical? How can I determine this? Thanks!

zachary-foster commented 6 years ago

Hi @wipperman, glad to hear it!

Good idea, I will look into this. It depends on the function used. With the default function, it is "sample_1" - "sample_2", with the order of samples the same as they appear in "obj$data$diff_table" is the above example . If the user defines their own function, I cannot determine what the numbers mean however

wipperman commented 6 years ago

@zachary-foster Thanks! I think this will confuse people (or at least it will confuse me!) so adding it to either the output of the function or the help section would be useful.

So just to be clear: using your function "compare_treatments", the output tibble has headers: "taxon_id, treatment_1, treatment_2, log2_median_ratio, median_diff, mean_diff, wilcox_p_value".

The color closest to Node 1 corresponds to treatment_2, and the color closer to the last node (n=# of OTUs) is treatment_1? For me, "treatment_1" is "healthy" and "treatment_2" is "treatment".

zachary-foster commented 6 years ago

I think this will confuse people (or at least it will confuse me!) so adding it to either the output of the function or the help section would be useful.

I agree, I will add something about this to the documentation.

The color closest to Node 1 corresponds to treatment_2, and the color closer to the last node (n=# of OTUs) is treatment_1? For me, "treatment_1" is "healthy" and "treatment_2" is "treatment".

Thats right. Its treatmeant_1 - treatment_2, so a positive number indicates more in treatment_1.

jarrodscott commented 6 years ago

Hello @zachary-foster

I am attempting to follow the "tutorial" in this thread. any reason I should get Error: could not find function "compare_treatments"? Could this possibly have anything to do with my data? Seem unlikely...

I have tried installing the devtools and stable versions. current version of metacoder installed is ‘0.2.0’ Maybe this has been deprecated in favor of compare_groups? However compare_treatments looks like a brand new function. Anyway thanks!

zachary-foster commented 6 years ago

Hello @jjscarpa. Yes, compare_groups is the name I decided on when I released the package to CRAN a few days ago. It used to be called compare_treatments. Its the same function with different option names. I changed it to make it more consistent with the other functions.

Take a look at the readme on this repo for a more up to date tutorial:

https://github.com/grunwaldlab/metacoder

jarrodscott commented 6 years ago

awesome thanks for the info. Really looking into checking out this analytical approach. Super cool. Thanks again

On Mon, Jan 8, 2018 at 12:04 PM, Zachary Foster notifications@github.com wrote:

Hello @jjscarpa https://github.com/jjscarpa. Yes, compare_groups is the name I decided on when I released the package to CRAN a few days ago. It used to be called compare_treatments. Its the same function with different option names. I changed it to make it more consistent with the other functions.

Take a look at the readme on this repo for a more up to date tutorial:

https://github.com/grunwaldlab/metacoder

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder/issues/141#issuecomment-356028062, or mute the thread https://github.com/notifications/unsubscribe-auth/APcStpoajWHSfbhy8uGKNf1yX6kkmOZYks5tIkqFgaJpZM4M5ace .

zachary-foster commented 6 years ago

No problem, thanks!

Marieag commented 6 years ago

Hi!

I am also very much interested in using metacoder for my project, I've got a phyloseq object I'd like to investigate, but I also hit a snag trying to run your function, I get a could not find function "parse_taxonomy_table" . I would be very happy if you could help me work around it?

I'm running metacoder v0.2.0.9007, just downloaded, with R 3.4.3 and newly installed packages, vegan v2.4.6 and phyloseq v1.22.3.

My dataset was constructed in Qiime, not mothur, so I'm not sure what to do, if I can't transform it directly from the phyloseq object. Any help would be very much appreciated!

Thanks!

zachary-foster commented 6 years ago

Hello @Marieag, are you using the phyloseq_to_taxmap that I posted above a while ago? If so, that has been replaced with parse_phyloseq included in the metacoder package (try ?metacoder::parse_phyloseq). I did find a bug in that parser shortly after the last CRAN release, so I suggest you install the dev version:

devtools::install_github("ropensci/taxa")
devtools::install_github("grunwaldlab/metacoder")

You are getting that error because metacoder::parse_taxonomy_table has been replaced by taxa::parse_tax_data.

Marieag commented 6 years ago

Hi, thanks for your reply!

I was indeed using your phyloseq_to_taxmap . I reinstalled metacoder and managed to get a taxmap-object using parse_phyloseq, but when trying to use it to make heat trees, I get the error

Error in if (zero_range(as.numeric(limits))) { : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In FUN(X[[i]], ...) :
  Could not apply layout 'reingold-tilford' to subgraph. Using 'fruchterman-reingold' instead.
2: In min(element_data$y) : no non-missing arguments to min; returning Inf

Any idea what might cause this?

zachary-foster commented 6 years ago

Hmm, hard to say without seeing your object. Can you use save to send me the phyloseq object and the code you use to parse and plot it? Thanks

Marieag commented 6 years ago

Of course! I should have added that in the last comment.

Overall.zip

I'm using the following code (let me know if you need more than this, I'm assuming you'll only need the metacoder code?)

###-----------Metacoder----------

#parse phyloseq object Overall, created from Qiime .biom-file and table of metadata. 
obj <- parse_phyloseq(Overall)

#removing counts of less than 1
obj$data$otu_table <- zero_low_counts(obj, "otu_table", min_count = 1)

no_reads <- rowSums(obj$data$otu_table[,2:39]) == 0
sum(no_reads)

obj <- filter_obs(obj, "tax_data", ! no_reads, drop_taxa = TRUE)
print(obj)

# Convert counts to proportions
obj$data$otu_table <- calc_obs_props(obj,
                                     dataset = "otu_table",
                                     cols = obj$data$sam_data$sample_ids)

# Calculate per-taxon proportions 
obj$data$tax_table <- calc_taxon_abund(obj, 
                                       dataset = "otu_table", 
                                       cols = obj$data$sam_data$sample_ids)
#construct heat tree
heat_tree(obj, 
          node_label = taxon_names,
          node_size = n_obs,
          node_color = Type)

# Calculate difference between treatments
obj$data$diff_table <- compare_groups(obj, daaset = "tax_table",
                                      cols = obj$data$tax_table$taxon_id,
                                      groups = obj$data$sam_data$Type)

print(obj$data$diff_table)

heat_tree(obj, 
          node_label = taxon_names,
          node_size = n_obs,
          node_color = Type, 
          node_size_axis_label = "OTU count",
          node_color_axis_label = "Samples with reads")

I'm getting the same error with both heat tree attempts, and also when running this bit of code on the GlobalPatterns dataset. I'm probably missing something really obvious - thank you so much for helping me, I really appreciate it!

xpingli commented 6 years ago

obj <- filter_obs(obj, "tax_data", ! no_reads, drop_taxa = TRUE) change the "tax_data" to "otu_table"; '# Calculate per-taxon proportions obj$data$tax_table <- calc_taxon_abund(obj, dataset = "otu_table", cols = obj$data$sam_data$sample_ids)' change $tax_table to $tax_abund

This is what I used to mine as I was following this issue. It worked for me but I am not sure if this would help your case.

zachary-foster commented 6 years ago

Thanks for the input @xpingli!

@Marieag, sorry for not getting back to you; I forgot about this and @xpingli's reply reminded me. I just looked at it and the problem is that you are trying to use "Type" for node color. There are two reasons this does not work:

I will try to make the error message more informative though.

Are you trying to compare "Control" and "Exclosure"? How about something like:

load("Overall.RData")

library(metacoder)

obj <- parse_phyloseq(Overall)

# Convert counts to proportions
obj$data$otu_table <- calc_obs_props(obj,
                                     dataset = "otu_table",
                                     cols = obj$data$sam_data$sample_ids)

# Calculate per-taxon proportions 
obj$data$tax_table <- calc_taxon_abund(obj, 
                                       dataset = "otu_table", 
                                       cols = obj$data$sam_data$sample_ids)

# construct heat tree
heat_tree(obj, 
          node_label = taxon_names,
          tree_label = taxon_names,
          node_size = n_obs,
          node_color = n_obs,
          node_size_axis_label = "OTU count",
          layout = "da", initial_layout = "re")

# Calculate difference between treatments
obj$data$diff_table <- compare_groups(obj, dataset = "tax_table",
                                      cols = obj$data$sam_data$sample_ids,
                                      groups = obj$data$sam_data$Type)

# comapre treatments
heat_tree(obj, 
          node_label = taxon_names,
          tree_label = taxon_names,
          node_size = n_obs,
          node_color = log2_median_ratio, 
          node_color_interval = c(-3, 3),
          node_color_range = c("cyan", "gray", "magenta"),
          node_size_axis_label = "OTU count",
          node_color_axis_label = "Log 2 ratio of median proportions",
          layout = "da", initial_layout = "re")

rplot

You might want to reinstall the dev versions of metacoder and taxa.

If you want to plot just the control or encolosure read counts, try using the cols and groups options in calc_taxon_abund (?calc_taxon_abund)

tarunaaggarwal commented 6 years ago

Hi @zachary-foster,

I'm trying to utilize the commands within this thread but I can't even get past parse_phyloseq. I have a merged Phyloseq object, and when I try to parse this object using the parse function, I'm getting Error: Column taxon_id must be a 1d atomic vector or a list. I found this online but I'm not quite sure how to implement this solution to a phyloseq object.

My code so far is below (it's not much). I'm using Phyloseq 1.22.3 and just installed Metacoder yesterday so the version is 0.2.0.9012. Thank you for your help and time in advance!

library(phyloseq)
library(taxa)
library(metacoder)
print(hmp_otus)
print(hmp_samples)

setwd("~/Dropbox/bik_lab/github/Funginomicon/memb-files")

# Import mapping file into phyloseq
map <- import_qiime_sample_data("18S-euk-QIIME-mapping-MEmicrobiome-FINAL-26Jul17.txt")

# Import biom tables into phyloseq
otu <- import_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")

#merge mapping file and otu tables into a phyloseq object 
physeq = merge_phyloseq(map,otu)
tax_table(physeq)

#check the otu and tax tables
otu_table(physeq)[1:5]
tax_table(physeq)[1:5]

#parse the phyloseq object for metacoder  
physeq_mc = parse_phyloseq(physeq)

heat_tree(physeq_mc,
          node_size = n_obs,
          node_color = n_obs,
          node_label = taxon_names,
          tree_label = taxon_names)

Output from sesionInfo():


R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] metacoder_0.2.0.9012 taxa_0.2.1 phyloseq_1.22.3

loaded via a namespace (and not attached): [1] reshape2_1.4.3 splines_3.4.2 lattice_0.20-35 rhdf5_2.22.0
[5] colorspace_1.3-2 stats4_3.4.2 mgcv_1.8-22 utf8_1.1.3
[9] survival_2.41-3 rlang_0.2.0 pillar_1.2.1 glue_1.2.0
[13] BiocGenerics_0.24.0 bindrcpp_0.2 foreach_1.4.4 bindr_0.1
[17] plyr_1.8.4 stringr_1.3.0 zlibbioc_1.24.0 Biostrings_2.46.0
[21] munsell_0.4.3 gtable_0.2.0 codetools_0.2-15 Biobase_2.38.0
[25] permute_0.9-4 IRanges_2.12.0 biomformat_1.6.0 parallel_3.4.2
[29] Rcpp_0.12.16 scales_0.5.0 vegan_2.5-1 S4Vectors_0.16.0
[33] jsonlite_1.5 XVector_0.18.0 ggplot2_2.2.1 stringi_1.1.7
[37] dplyr_0.7.4 grid_3.4.2 ade4_1.7-8 cli_1.0.0
[41] tools_3.4.2 magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2
[45] cluster_2.0.6 crayon_1.3.4 ape_5.1 pkgconfig_2.0.1
[49] MASS_7.3-47 Matrix_1.2-12 data.table_1.10.4-3 assertthat_0.2.0
[53] rstudioapi_0.7 iterators_1.0.9 R6_2.2.2 multtest_2.34.0
[57] igraph_1.2.1 nlme_3.1-131 compiler_3.4.2


Corresponding mapping file and biom table are [here](https://www.dropbox.com/sh/narv17v5z7vxahf/AABEURI2FmWAEuZPvaesAETQa?dl=0).
zachary-foster commented 6 years ago

Hello @tarunaaggarwal, sorry for the trouble.

I have not seen that error before. Can you send me the result of sessionInfo() and the object using save(my_obj, file = "my_file_name.RData")? Its hard for me to diagnose without reproducing the error. Thanks!

tarunaaggarwal commented 6 years ago

Hi @zachary-foster - Sorry I posted the question before I was ready. I'm working on getting you the files. I will edit my original post. Just need a couple more mins

tarunaaggarwal commented 6 years ago

Hi @zachary-foster. Did my files go through okay?

zachary-foster commented 6 years ago

I don't see them here, did you email them to me?

tarunaaggarwal commented 6 years ago

They are at the bottom of my original post as a link.

zachary-foster commented 6 years ago

Oh, I see them now. Sorry about that. I was expecting to get an email alert, but I guess they don't send out email alerts for recently edited posts. Thanks for the files! I will look at this today.

zachary-foster commented 6 years ago

@tarunaaggarwal,

I think I have it fixed. It was caused parse_phyloseq not knowing what to do with the arbitrary rank names in the phyloseq object. I have not seen that before, so I removed the requirements on rank names and it seems to work. You will need to reinstall:

devtools::install_github("ropensci/taxa")
devtools::install_github("grunwaldlab/metacoder")

This works for me:

library(phyloseq)
library(taxa)
library(metacoder)

# Import mapping file into phyloseq
map <- import_qiime_sample_data("18S-euk-QIIME-mapping-MEmicrobiome-FINAL-26Jul17.txt")

# Import biom tables into phyloseq
otu <- import_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")

# merge mapping file and otu tables into a phyloseq object 
physeq = merge_phyloseq(map,otu)

# parse the phyloseq object for metacoder  
physeq_mc = parse_phyloseq(physeq)

# ALTERNATIVE: Parse with cleaner taxon names
#physeq_mc = parse_phyloseq(physeq,
#                           class_regex = "^D?_?[0-9]*_?_?(.+)$")

# Plot
heat_tree(physeq_mc,
          node_size = n_obs,
          node_color = n_obs,
          node_label = taxon_names,
          tree_label = taxon_names)

Note that there is a parser for the QIIME biome format called parse_qiime_biom. This also works:

library(metacoder)

# Using the biome parser in metacoder
physeq_mc = parse_qiime_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")

# ALTERNATIVE: Using the biome parser in metacoder with cleaner taxon names
#physeq_mc = parse_qiime_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom",
#                             class_regex = "^D?_?[0-9]*_?_?(.+)$")

# Plotting
heat_tree(physeq_mc,
          node_size = n_obs,
          node_color = n_obs,
          node_label = taxon_names,
          tree_label = taxon_names)
tarunaaggarwal commented 6 years ago

Hi @zachary-foster - thank you very much for working on this but unfortunately, I'm still getting errors (different ones now though). I tried both of your parsing commands after reinstalling the repos and I think these errors are related to installation :-/ Please see the commands and errors below.

devtools::install_github("ropensci/taxa")
devtools::install_github("grunwaldlab/metacoder")
library(phyloseq)
library(taxa)
library(metacoder)

# Import mapping file into phyloseq
map <- import_qiime_sample_data("18S-euk-QIIME-mapping-MEmicrobiome-FINAL-26Jul17.txt")

# Import biom tables into phyloseq
otu <- import_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")

#merge mapping file and otu tables into a phyloseq object 
physeq = merge_phyloseq(map,otu)

#parse the phyloseq object for metacoder  
physeq_mc = parse_phyloseq(physeq)

gives the following error

Error in getExportedValue(pkg, name) : 
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/taxa/data/Rdata.rdb' is corrupt
In addition: Warning message:
In getExportedValue(pkg, name) : internal error -3 in R_decompress1

AND

physeq_mc = parse_qiime_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")

gives a similar error.

Error in get(obj) : 
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/taxa/data/Rdata.rdb' is corrupt
In addition: Warning messages:
1: In strsplit(msg, "\n") : input string 1 is invalid in this locale
2: In get(obj) : internal error -3 in R_decompress1
zachary-foster commented 6 years ago

Yea, I see those occasionally. Try restarting R/Rstudio. That fixes it when it happens to me. I think it is caused by reinstalling packages that are currently loaded.

tarunaaggarwal commented 6 years ago

Lovely! Thank you so much! Now, I can finally try what others have posted in this issue thread :-) Thank you, @zachary-foster!

zachary-foster commented 6 years ago

No problem! Im glad it worked

tarunaaggarwal commented 6 years ago

Hi @zachary-foster - thank you for your help earlier this week. I ran into another issue that I was hoping to get your help with please. I have a biom table that I parse using your alternative approach and this table contains only ONE sample. Perhaps that is why the rowSums is giving an error but I'm not sure.

Code


obj = parse_qiime_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved_nem46.biom", 
class_regex = "^D?_?[0-9]*_?_?(.+)$")

obj$data$otu_table <- zero_low_counts(obj, "otu_table", min_count = 5) no_reads <- rowSums(obj$data$otu_table[, obj$sample_id]) == 0


> `rowSums` is giving the error below.

Error: Unsupported index type: NULL



I put this biom table file in the same folder that I shared with you via Dropbox. THANKS!
zachary-foster commented 6 years ago

Hi @tarunaaggarwal, I think it is because obj$sample_id does not exist. In the example, there was a table called hmp_samples that contained the sample IDs, which were the column names of the abundance matrix. Since you only have one sample,

no_reads <- obj$data$otu_table[, 1] == 0

should work the same. rowSums is not needed. There are probably other parts of the code you will need to change as well, since you dont have a sample data table and just have one sample.

tarunaaggarwal commented 6 years ago

Ah I see. Lovely! Thanks, @zachary-foster. It is so nice how responsive you are.

tarunaaggarwal commented 6 years ago

Hey @zachary-foster - I forgot to ask you a question. Can Metacoder make graphs for each sample individually? I have a biom table for OTUs associated with single nematodes as samples and if I wish to make a graph for each worm separately, is there an easy way to do that in Metacoder? Also, can I make such separate graphs for a particular taxon such as Fungi only?

zachary-foster commented 6 years ago

Can Metacoder make graphs for each sample individually?

It does not do it automatically, but its quite easy to do. You can put the heat_tree command in a loop / apply function and change the column name for the counts you want to graph in each iteration. Something like:

plot_one <- function(my_col_name) {
  heat_tree(obj,
    node_color = obj$data$my_table[[my_col_name]],
    output_file = paste0("plot for sample ", my_col_name, ".pdf"),
    ...) # other commands
}
my_plots <- lapply(my_sample_names, plot_one)

You could also use a for loop.

Also, can I make such separate graphs for a particular taxon such as Fungi only?

Yes, even easier. Use filter_taxa to subset the data before graphing. If you are only subsetting for the graph, you can use the %>% operator to avoid subsetting the data in the rest of your anaysis or making a copy of it.

obs %>%
  filter_taxa(taxon_names == "Funig", subtaxa = TRUE) %>%
  heat_tree(...)