grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

Error with parse_tax_data and installation issues #334

Open zachary-foster opened 2 years ago

zachary-foster commented 2 years ago
Transferred from https://github.com/ropensci/taxa/issues/210 for @emankhalaf I have a feature table with taxonomy collapsed to the genus level, where the first column is the taxonomy (ranks separated by ;), then the rest of columns represents samples_id showing the read count of each feature. I need to split the taxonomy column into 6 taxonomic ranks using parse_tax_data function. I used this code: ``` obj <- parse_tax_data(feature-table-with-taxonomyl6, class_cols = "taxonomy", class_sep = ";", class_regex = "^([a-z]{0,1})_{0,2}(.*)$", class_key = c("tax_rank" = "taxon_rank", "name" = "taxon_name")) print(obj) ``` then I got this error: ``` Error in parse_tax_data(feature - table - with - taxonomyl6, class_cols = "taxonomy", : could not find function "parse_tax_data" ``` However, I already loaded taxa package but I have a problem when installed devtools. Thanks! Eman
zachary-foster commented 2 years ago

Can you give me part of the input data so I can see how it is formatted?

I need to split the taxonomy column into 6 taxonomic ranks

If you are just trying to split taxonomy column in to 6 per-rank columns and don't need to use other metacoder functions that require the taxmap objects produced by parse_tax_data, you can use:

library(tidyr)
separate(feature-table-with-taxonomyl6, taxonomy, c("Kingdom", "Class", "Order", "etc..."), sep = ';')
emankhalaf commented 2 years ago

I did the following:

my_table <- read_csv("file.csv", col_names = TRUE) #  readr function
GT <- separate(my_table, taxonomy, c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), sep = ";")
head(GT)

I got this error:

Error:
! Must extract column with a single valid subscript.
x Subscript `var` has the wrong type `function`.
ℹ It must be numeric or character.
Backtrace:
  1. tidyr::separate(...)
  2. tidyr:::separate.data.frame(...)
  3. tidyselect::vars_pull(names(data), !!enquo(col))
  4. tidyselect:::pull_as_location2(loc, n, vars)
 13. vctrs::vec_as_subscript2(i, arg = "var", logical = "error")
 14. vctrs:::result_get(...)
 Error: 
x Subscript `var` has the wrong type `function`.
ℹ It must be numeric or character.

Any recommendations here! Much thanks!

zachary-foster commented 2 years ago

What does the table look like?

emankhalaf commented 2 years ago

It is feature table with taxonomy as txt file then I converted it into csv. So, the first row is the header including taxonomy, S1, S2,.... Then, the row names are the taxonomy (d_kingdom up to s_species), and the abundance/read count of each feature across samples. I can e.mail the file to you if you do not mind!

Thank you! Eman

zachary-foster commented 2 years ago

Yea, it would be helpful if you emailed the file to me or attached it here.

zacharyfoster1989@gmail.com

zachary-foster commented 2 years ago

You have a column at the end named taxonomy too. Since you have two columns with the same name readr::read_csv renames them, which is why your code did not work. Note that readr::read_csv tells you when it renames columns in the output below. Does this do what you wanted?

library(readr)
library(tidyr)
my_table <- read_csv("~/Downloads/feature-table-with-taxonomyl6.csv", col_names = TRUE) #  readr function
#> New names:
#> * taxonomy -> taxonomy...1
#> * taxonomy -> taxonomy...58
#> Rows: 308 Columns: 58
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): taxonomy...1
#> dbl (56): 1P-GH-R1, 1P-GH-R2, P1, P10, P11, P12b, P13, P14b, P15, P16, P17, ...
#> lgl  (1): taxonomy...58
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
GT <- separate(my_table, "taxonomy...1", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";") # No species rank in data
GT # Dont need to use head for tibbles
#> # A tibble: 308 × 63
#>    Kingdom     Phylum Class Order Family Genus `1P-GH-R1` `1P-GH-R2`    P1   P10
#>    <chr>       <chr>  <chr> <chr> <chr>  <chr>      <dbl>      <dbl> <dbl> <dbl>
#>  1 d__Bacteria p__Pr… c__G… o__E… f__Er… g__P…          0          0  8276  6048
#>  2 d__Bacteria __     __    __    __     __             0          0     0     2
#>  3 d__Bacteria p__Ch… c__D… o__S… f__S0… g__S…          0          0     0    53
#>  4 d__Bacteria p__Ba… c__B… o__S… f__Sp… g__S…          0          0     0     0
#>  5 d__Bacteria p__Fi… c__B… o__B… f__Ba… g__B…          0          0     0  1283
#>  6 d__Bacteria p__Ba… c__B… o__C… __     __             0          0     0     0
#>  7 d__Bacteria p__Fi… c__S… o__S… f__Sy… g__C…          0          0     0     0
#>  8 d__Bacteria p__Ba… c__B… o__C… f__Cy… g__S…          0          0    26    11
#>  9 d__Bacteria p__Pr… c__G… __    __     __             0          0     0     0
#> 10 d__Bacteria p__Ba… c__B… o__F… f__We… g__C…          0          0    34    32
#> # … with 298 more rows, and 53 more variables: P11 <dbl>, P12b <dbl>,
#> #   P13 <dbl>, P14b <dbl>, P15 <dbl>, P16 <dbl>, P17 <dbl>, P19 <dbl>,
#> #   P2 <dbl>, P20 <dbl>, P21 <dbl>, P22 <dbl>, P23 <dbl>, P24 <dbl>, P25 <dbl>,
#> #   P26 <dbl>, P27 <dbl>, P28 <dbl>, P29 <dbl>, P31 <dbl>, P32 <dbl>,
#> #   P33 <dbl>, P34b <dbl>, P35 <dbl>, P36b <dbl>, P37 <dbl>, P38 <dbl>,
#> #   P39b <dbl>, P40b <dbl>, P41 <dbl>, P42 <dbl>, P43 <dbl>, P44 <dbl>,
#> #   P45 <dbl>, P46 <dbl>, P47 <dbl>, P48 <dbl>, P49 <dbl>, P4b <dbl>, …

Created on 2022-03-02 by the reprex package (v2.0.1)

emankhalaf commented 2 years ago

@zachary-foster Thank you so much! Now it works. I exported the file as tsv and deleted the extra taxonomy column.

zachary-foster commented 2 years ago

No problem! Glad its working