Closed wipperman closed 5 years ago
Hey @zachary-foster... So I'm very close to getting the graph for only fungal groups but keep running into an edge_label
error.
obj = parse_qiime_biom("otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved_nem46.biom",
class_regex = "^D?_?[0-9]*_?_?(.+)$")
print(obj)
obj$data$otu_table <- zero_low_counts(obj, "otu_table", min_count = 1)
no_reads <- obj$data$otu_table[, 1] == 0
sum(no_reads)
# Convert counts to proportions
obj$data$otu_table <- calc_obs_props(obj,
dataset = "otu_table",
cols = obj$data$sam_data$sample_ids)
# Calculate per-taxon proportions
obj$data$otu_table <- calc_taxon_abund(obj,
dataset = "otu_table",
cols = obj$data$sam_data$sample_ids)
# construct heat tree
obj %>%
filter_taxa(taxon_names == "Fungi", subtaxa = TRUE) %>%
heat_tree(obj,
node_size = n_obs,
node_color = n_obs,
node_label = taxon_names,
tree_label = taxon_names)
Error:
Error in check_element_length(c("node_size", "edge_size", "node_label_size", :
Length of argument'edge_size' must be a factor of the length of 'taxon_id'
I tried setting the edge_size
to taxon_names
but then I keep getting a similar error but for edge_label
or node_label_size
or edge_label_size
, etc.
Please let me know what I'm doing wrong here. Thank you so much!
Hi @tarunaaggarwal, the error is because you gave the obj
to the heat_tree
command even though you already passed it in via the %>%
, so it was trying to use the object for the next undefined parameter, which happened to be edge_size
. Sorry for the unhelpful error message. Its a common enough mistake that I should make some code to look for it (#231).
If you have not used %>%
before: it takes what came before and uses it as the first input to the next function. For example, the two examples below do the exact same thing:
obj %>%
filter_taxa(taxon_names == "Fungi", subtaxa = TRUE) %>%
heat_tree(node_size = n_obs,
node_color = n_obs,
node_label = taxon_names,
tree_label = taxon_names)
just_fungi <- filter_taxa(obj, taxon_names == "Fungi", subtaxa = TRUE)
heat_tree(just_fungi,
node_size = n_obs,
node_color = n_obs,
node_label = taxon_names,
tree_label = taxon_names)
@zachary-foster @tarunaaggarwal
Hello guys. From what I can tell the issue might not be with metacoder, but with the way you've parsed your data in phyloseq. After taking a look at your .biom files that you linked above it looks like you've used the SILVA database to annotate your data. Is that correct @tarunaaggarwal?
If you've used SILVA it designates ranks with the D_0__, D_1_, etc. prefixes. (Unlike GreenGenes which has prefixes that look something like k, p, c, etc.)
If so you can use the function that I've created and submitted as a PR to phylsoeq here: https://github.com/joey711/phyloseq/pull/854
If you have the function loaded, then you'll end up doing something like this:
# import biom data
silva_biom <- system.file("extdata", "SILVA_OTU_table.biom", package="phyloseq")
# Create phyloseq object using silva parseing function
silva_phyloseq <- import_biom(BIOMfilename = silva_biom, parseFunction = parse_taxonomy_silva_128)
Hey @zachary-foster - Thank you for explaining the %>%
usage. I have only used it a couple of times in the past and I keep forgetting that it operates like the |
. It worked! Check out the lovely graph. Once I get the for
loop to work, I will post my code in a separate issue page for anyone in the future who wishes to use it.
Hi @grabear - I did indeed use the SILVA database. You caught that fast! đź‘Ť Thank you for sending your R code. I'm sure it will be a huge time saver for our lab since we use Phyloseq a bunch. So does this mean you use SILVA database as well? How do you like the SILVA taxonomy for Qiime? We have been thinking about fixing the taxonomy manually but it is such a HUGE task. Just wondering what your opinion is.
Thanks fellas! Appreciate your help!
The microbiome project I'm working on is the first one I've done. So it's also the first time I used metacoder. But in my journey I read this journal article.
I like that SILVA is more up to date (29/09/2016) vs GreenGenes (May 2013).
There's also this:
SILVA, being the largest of the three 16S based taxonomies, shares the most taxonomic units with NCBI
That whole article was helpful to me.
Thanks for you input @grabear!
@tarunaaggarwal:
Great! I am glad it worked. Anyone know where ascomycota and basidomycota are in SILVA's taxonomy? Seems like they should be there.
Once I get the for loop to work, I will post my code in a separate issue page for anyone in the future who wishes to use it.
Cool, thanks!
Hey @zachary-foster - I'm getting back to metacoder this week so I'm sorry I just noticed that you asked a question in your response. I just grep'd for D_5Ascomycota and D_5__Basidiomycota in the taxonomy file that comes with SILVA (v132) and I found 3402 and 2443 OTUs containing **D_5Ascomycota and D_5__Basidiomycota**, respectively, using the consensus_taxonomy_all_levels.txt
.
Were you using the file with 7 levels?
I believe Qiime uses the 7 level taxonomy file @tarunaaggarwal. Not sure of other software that uses SILVA, so that's my only reference.
Hey @grabear - I thought you can specify any taxonomy file with any number of levels in Qiime. Either way, if I need more levels, I just replace the taxonomy strings with all 14 levels using a quick Python script.
So how is Metacoder working out for you? I ask because I have been thinking of ways to make it work best for our lab. Ideally, I want to import the biom table (without taxonomy info), mapping file and a taxonomy file into metacoder without Phyloseq. Do you know how to do that?
@tarunaaggarwal, I just ask because I did not see any fungal phyla in your plot and having them there would make it look better perhaps. Did they get filtered out before plotting?
Hey @zachary-foster! Oh I see. Those levels were not present in the 7 level taxonomy file we used to classify the OTUs I believe. Hence, they were missing. I have another question for you Zach. Is it possible to import the biom table (without taxonomy info), mapping file and a taxonomy file into metacoder without Phyloseq?
So the biome file has an abundance matrix but no tax info? If you send me an example of each, I can probably tell you how to do it. Anything tabular for sure. I do have a biome parser as well, although I have never tried reading one without taxonomy info.
Thanks @zachary-foster ! Here is the folder containing both types of biom tables - with and without tax. The one without taxonomy info is accompanied with a taxonomy file. I hope I'm not creating too much work for you. THANKS for your help!
Is there some reason that you need to use a biom file without taxonomies?
@grabear Sort of. We just reassigned taxonomy with SILVA 132 and I'd rather just find a way to work with the new taxonomy within R than to have to add to the biom table and refilter all over again. If its not feasible, its not the end of the world.
@tarunaaggarwal, no problem! I found a way to do it:
library(metacoder)
with_tax <- parse_qiime_biom("biom-table-with-tax/otu_table_mc2_w_tax_BlankOTUsRemoved_BlankSamplesRemoved.biom")
print(with_tax)
## <Taxmap>
## 521 taxa: ab. D_0__Eukaryota ... ub. D_11__Norrlinia[truncated]
## 521 edges: NA->ab, ab->ac, ab->ad ... kd->tz, lq->ua, lr->ub
## 1 data sets:
## otu_table:
## # A tibble: 30,871 x 283
## taxon_id otu_id MEMB.nem.105 MEMB.nem.117 MEMB.nem.156
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 bh FJ48040… 0. 0. 0.
## 2 ls AF50812… 0. 0. 0.
## 3 lt AY63045… 0. 0. 0.
## # ... with 3.087e+04 more rows, and 278 more variables:
## # MEMB.nem.157 <dbl>, MEMB.nem.167 <dbl>,
## # MEMB.nem.168 <dbl>, MEMB.nem.176 <dbl>,
## # MEMB.nem.190 <dbl>, MEMB.nem.198 <dbl>,
## # MEMB.nem.200 <dbl>, MEMB.nem.22 <dbl>,
## # MEMB.nem.232 <dbl>, MEMB.nem.272 <dbl>, …
## 0 functions:
# Get OTU abundance
without_tax <- biomformat::read_biom("biom-table-without-tax/otu_table_mc2.biom")
## Warning in strsplit(msg, "\n"): input string 1 is invalid in this locale
otu_table <- dplyr::as_tibble(as.matrix(biomformat::biom_data(without_tax)))
# Get taxonomy file
taxonomy <- readr::read_tsv("biom-table-without-tax/rep_set_tax_assignments.txt",
col_names = c("otu_id", "tax", "some_number"))
## Parsed with column specification:
## cols(
## otu_id = col_character(),
## tax = col_character(),
## some_number = col_double()
## )
# Combine both in a taxmap object
obj <- parse_tax_data(taxonomy,
class_cols = "tax", class_sep = ";",
datasets = list(otu_table = otu_table),
mappings = c("{{index}}" = "{{index}}"))
print(obj)
## <Taxmap>
## 2375 taxa: aab. D_0__Eukaryota ... dnj. D_14__
## 2375 edges: NA->aab, aab->aac ... czz->dni, daa->dnj
## 2 data sets:
## tax_data:
## # A tibble: 31,848 x 4
## taxon_id otu_id tax some_number
## <chr> <chr> <chr> <dbl>
## 1 dab New.Refe… D_0__Eukaryota;D_1__Opis… 0.700
## 2 amc New.Refe… D_0__Eukaryota;D_1__Opis… 0.820
## 3 amd New.Clea… D_0__Eukaryota;D_1__Opis… 1.00
## # ... with 3.184e+04 more rows
## otu_table:
## # A tibble: 31,848 x 305
## taxon_id MEMB.nem.105 MEMB.nem.117 MEMB.nem.156
## <chr> <dbl> <dbl> <dbl>
## 1 dab 4. 1. 36.
## 2 amc 0. 0. 0.
## 3 amd 0. 0. 0.
## # ... with 3.184e+04 more rows, and 301 more variables:
## # MEMB.nem.157 <dbl>, MEMB.nem.167 <dbl>,
## # MEMB.nem.168 <dbl>, MEMB.nem.176 <dbl>,
## # MEMB.nem.190 <dbl>, MEMB.nem.198 <dbl>,
## # MEMB.nem.200 <dbl>, MEMB.nem.22 <dbl>,
## # MEMB.nem.232 <dbl>, MEMB.nem.272 <dbl>, …
## 0 functions:
If you want to remove the "D_0__" and things like "(animals)" from the names, you can use the class_key
and class_regex
options to do that if you know some regex
@zachary-foster Nice! I will try this right away. And how to import the mapping file please?
Oh yea, the mapping file. It has no taxonomic info associated with it, so it can stay a separate table. The taxmap
object is only concerned with data that has taxonomic info associated with it, unlike phyloseq
objects. You could put the mapping file in there, but it would not help anything. I like the readr
package for tabular data:
mapping <- readr::read_tsv("18S-euk-QIIME-mapping-MEmicrobiome-FINAL-26Jul17.txt")
If you want, you could add it to the taxmap
object like so:
obj$data$mapping <- readr::read_tsv("18S-euk-QIIME-mapping-MEmicrobiome-FINAL-26Jul17.txt")
obj$data
is a list, so you can put anything you want there, but it will not make things easier unless what you put there is named by taxon IDs.
Morning @zachary-foster - may I please have your email address?
Sure, its zacharyfoster1989
a gmail.com
Thanks! I emailed ya.
Closing due to inactivity. If there are still unresolved issues, feel free to reopen this issue or open a new issue.
Hi @zachary-foster I am trying to make a differential heat tree for two categories (birch vs spruce) Everything works fine till I plot overall tree, tree for individual categories, but when I am planning to make differential tree for comparing two categories based on log2 mean ration I am getting error (see code and error below). ###################
################### metacoder.data$taxon_names
families$data$diff_table <- compare_groups(metacoder.data, data = "tax_by_host", #_by_host OR specify which information to use - here it is the abundance of the reads - you could also do the proportion in the samples using tax_data instead cols = c("Birch","Spruce"), #specify the names of the columns containing your OTU counts groups = c("Birch","Spruce")) families$data$diff_table$log2_median_ratio heat_tree(families, node_label = taxon_names, #specifies what names to give the circles (nodes) node_size = n_obs, #size nodes by total number of reads node_color = log2_median_ratio, #specifies the differences between groups
#edge_size = total,#thickness of lines determined by total # of reads
node_color_interval = c(-2, 2),
node_color_range = c("cadetblue", "grey75", "darkorange1"),
node_label_size_range = c(0.007,0.06),#adjust the min-max numbers to change the relative size of the text
node_size_axis_label = "Total Taxon Abundance",
node_color_axis_label = "Differentially Occurring Samples")
Error in check_element_length(c("node_size", "edge_size", "node_label_size", : Length of argument'node_color' must be a factor of the length of 'taxon_id'
There seems some problems in defining node_color, and I am not able to figure out how should I get it correct?
Looking forward to hear from you
Regards Sunil
Hi Sunil,
What does the print out for families
look like before plotting? Thanks
Can you point me to a tutorial or accessor function for how to take an OTU table + metadata and convert this to a taxmap object, which can then be used to make plots? Is this possible with the package? I see that it is in the future directions in the paper, but am unable to figure it out myself (although am able to run all of the software and the examples!). Thanks so much for the help.