Closed lfenske-93 closed 8 months ago
I had a similar problem getting my data into package. I started out by trying to follow the "EXAMPLE ANALYSIS" on the following page https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/index.html. Then I went to the WORK SHOP menu on the same page and found a collection of pages that will take you step-by-step through the analysis. I found the "REQUIRED DATASETS" page very helpful! it has example files that illustrate exactly the format you need. I also made sure to install all the necessary software and dependencies listed in the REQUIRED SOFTWARE page. Since doing that I just followed the remainging pages and had success. Hope this helps.
This should be doable with parse_tax_data
. It is extremely flexible and I recommend looking at its help page and examples. I can recommend a way to parse your data, but I am confused about the format. You say:
My dataset looks like this, with all columns tab-separated.
But it looks like it is separated by |
. Can you attach a subset of the data as a file? Thanks!
Hi,
sorry for the confusion, the dataset is tab-separated I just tried to post a somewhat understandable example here. 😅
I attached a subset of my dataset. I tried with parse_tax_data
but I probably just didn't do it quite right.
Many thanks for your help! Best regards, Linda
No worries! Thanks for the example data. Here is how to parse that data format:
library(metacoder)
#> This is metacoder version 0.3.6 (stable)
library(readr)
raw_data <- read_tsv('~/Downloads/taxinfo_subset.csv')
#> Rows: 28 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (7): domain, phylum, class, order, family, genus, species
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
x <- parse_tax_data(raw_data, class_cols = 1:7, named_by_rank = TRUE)
print(x)
#> <Taxmap>
#> 44 taxa: ab. Bacteria, ac. Firmicutes ... br. coli, bs. subtilis
#> 44 edges: NA->ab, ab->ac, ab->ad ... be->bq, bf->br, bg->bs
#> 1 data sets:
#> tax_data:
#> # A tibble: 28 × 8
#> taxon_id domain phylum class order family genus species
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 bh Bacteria Firmicutes Bacil… Lact… Lacto… Pauc… hokkai…
#> 2 bi Bacteria Firmicutes Bacil… Lact… Lacto… Secu… oryzae
#> 3 bj Bacteria Firmicutes Bacil… Lact… Strep… Stre… pyogen…
#> # ℹ 25 more rows
#> 0 functions:
heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs)
Created on 2023-11-17 with reprex v2.0.2
Many thanks! Then I wasn't that far off with my attempt after all.
I'm looking forward to playing around with it a bit, great thing you've created. ❤️
No problem! Thank you!
Hi,
I stumbled across your tool and have the feeling that it is just right for my application, but even with the help of your documentation I haven't quite figured out if and how I can use it for my data set.
I have taxonomic data from GTDBtk and my aim is to map the bias within this data, i.e. to show which taxa are particularly abundant.
My dataset looks like this, with all columns tab-separated. And I'm trying to find out how exactly I can convert this data set into a
taxmap
object or what I need to do first.This was my first attempt, but it doesn't looks right and I'm struggling creating a
heat_tree
out of this:Perhaps one of you can give me a brief idea of how exactly I can work with my data set. It seems to be a relatively simple example, but unfortunately I'm still stuck.
Kind regards, Linda