tax_data seems missing after parse_tax_data step from the tutorial

cmonat commented 5 years ago

Hi,

I'm trying to redo the tutorial on my dataset. I first try without changing my dataset and when I've got to the step of parse_tax_data I had an error message, so i tried to modify my taxonomy to match yours but still I've got strange results. For example when I tried:

obj1 <- parse_tax_data(hmp_otus1, class_cols = "lineage", class_sep = ";", class_regex = "^(.+)__(.+)$", class_key = c(tax_rank = "info", tax_name= "taxon_name")) I obtained the following error message: Error: No item named "lineage" in the following inputs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ... 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37

I resolve this by doing so: obj1 <- parse_tax_data(hmp_otus1$lineage, class_sep = ";", class_regex = "^(.+)__(.+)$", class_key = c(tax_rank = "info", tax_name= "taxon_name"))

but after that, when printing obj1, the tax_data part seems to be incomplet. and I have this:

tax_data: a named vector of 'character' with 1767 items On the other part, the class_data seems to look similar to yours.

I have tried to continue anyway but for the part to remove the low-abundance counts, R told me that the object supplied is not a taxmap object. It appears to be of type "MRexperiment";

Do you have any idea what I should change to make it work? I hope everything is clear. If you need further information, please let me know. Thank you very much in advance.

Cheers C.

zachary-foster commented 5 years ago

Hello @cmonat,

What does the print out for hmp_otus1 look like?

cmonat commented 5 years ago

Hi,

the print of hmp_otus1 is really big so I cannot copy-paste it here, but it's showing in details every column of hmp_otus1 data.frame. So first it print the $'otu_id' column with the name of my clusters (so for example "Cluster_58" "Cluster_284" ...), then it prints the $'lineage' column with the taxonomy that looks like yours, then it starts to print my first column of result $'X1' with all the values for every cluster. I hope my description is clear enough and it can help you to identify what I should change to make it work.

Thank you very much for your help. Have a great day

Cheers C.

Le ven. 12 juil. 2019 à 17:56, Zachary Foster notifications@github.com a écrit :

Hello @cmonat https://github.com/cmonat,

What does the print out for hmp_otus1 look like?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder_documentation/issues/6?email_source=notifications&email_token=ABXHGO2RURNS2DNHOYTFXWDP7CSTHA5CNFSM4ICLXHD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ2FDKQ#issuecomment-510939562, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXHGO7RC6HMV22G3JGRBCTP7CSTHANCNFSM4ICLXHDQ .

zachary-foster commented 5 years ago

Hi @cmonat, can you email me the results of save(hmp_otus1, file = 'hmp_otus1.RData')? Its hard for me to figure out what the problem is without reproducing the error.

zachary-foster commented 5 years ago

Thanks for emailing me the data.

The issue is that hmp_otus1 is a list instead of a table. If you convert it to a data.frame with as.data.frame or to a tibble with as_tibble, it will work. Its actually an easy fix since tables in R are a type of list:

is.list(data.frame(x = 1:10))
#> [1] TRUE

^{Created on 2019-07-15 by the reprex package (v0.3.0)}

If you do this, it should work:

load('~/Downloads/hmp_otus1.RData')

library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.2.9002 (development version)
library(tibble)

hmp_otus1 <- as_tibble(hmp_otus1)
hmp_otus1
#> # A tibble: 1,767 x 37
#>    otu_id lineage    X1   X10   X11   X12   X13   X14   X15   X16   X18
#>    <chr>  <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1 Clust… d__ Ba…   253   184   128   196   134    87   137    89    31
#>  2 Clust… d__ Ba…   267     0     2     4     0     0     1     0     0
#>  3 Clust… d__ Ba…   300     6     6     6     0     0     0     0     0
#>  4 Clust… d__ Ba…   101    82   149   169   190   214   215   171   178
#>  5 Clust… d__ Ba…  1157    28    14    17    17     6     1     1     0
#>  6 Clust… d__ Ba…    40    10     5     7     2     0     3     3     0
#>  7 Clust… d__ Ba…   236   215   446   392   298   223   369   261   426
#>  8 Clust… d__ Ba…    39    46    69    92    69    70    98    62    86
#>  9 Clust… d__ Ba…     3     0     6     6     5     7     4     7     5
#> 10 Clust… d__ Ba…     4     9     5    10     2     6     5    10     1
#> # … with 1,757 more rows, and 26 more variables: X19 <int>, X2 <int>,
#> #   X20 <int>, X21 <int>, X22 <int>, X23 <int>, X24 <int>, X25 <int>,
#> #   X26 <int>, X27 <int>, X28 <int>, X29 <int>, X3 <int>, X30 <int>,
#> #   X31 <int>, X32 <int>, X33 <int>, X34 <int>, X35 <int>, X36 <int>,
#> #   X4 <int>, X5 <int>, X6 <int>, X7 <int>, X8 <int>, X9 <int>
obj1 <- parse_tax_data(hmp_otus1,
                       class_cols = "lineage",
                       class_sep = " ; ",
                       class_regex = "^(.+)__ ?(.+)$",
                       class_key = c(tax_rank = "info", tax_name= "taxon_name"))
obj1
#> <Taxmap>
#>   1147 taxa: aab. Bacteria ... bsd. Multi-affiliation
#>   1147 edges: NA->aab, aab->aac ... bcm->bsc, bcn->bsd
#>   2 data sets:
#>     tax_data:
#>       # A tibble: 1,767 x 38
#>         taxon_id otu_id lineage    X1   X10   X11   X12   X13
#>         <chr>    <chr>  <chr>   <int> <int> <int> <int> <int>
#>       1 bco      Clust… d__ Ba…   253   184   128   196   134
#>       2 bcp      Clust… d__ Ba…   267     0     2     4     0
#>       3 bcq      Clust… d__ Ba…   300     6     6     6     0
#>       # … with 1,764 more rows, and 30 more variables:
#>       #   X14 <int>, X15 <int>, X16 <int>, X18 <int>, X19 <int>,
#>       #   X2 <int>, X20 <int>, X21 <int>, X22 <int>, X23 <int>,
#>       #   …
#>     class_data:
#>       # A tibble: 12,369 x 5
#>         taxon_id input_index tax_rank tax_name     regex_match   
#>         <chr>          <int> <chr>    <chr>        <chr>         
#>       1 aab                1 d        Bacteria     d__ Bacteria  
#>       2 aac                1 p        Verrucomicr… p__Verrucomic…
#>       3 aay                1 c        Verrucomicr… c__Verrucomic…
#>       # … with 1.237e+04 more rows
#>   0 functions:

^{Created on 2019-07-15 by the reprex package (v0.3.0)}

A few things to note:

You dont need to call the variables hmp_otus or obj, its just what I called them. Use names that make sense to you. The table names (e.g. tax_data) are also arbitrary, so you can rename them with names(obj$data) <- c(...)` if you want.
The lineage info has a slightly different format so I modified the class_sep and class_regex some so you don't have spaces around the taxon names.
The column names X# look automatically generated by R. perhaps the CSV you imported from did not have column names? I recommend making unique sample identifiers for column names, even if they are just numbers, so that you can have a sample metadata table like hmp_samples.
I used tibbles in the example above, but you can use plain data.frames if you want. tibbles are a type of data.frame with better printing, but otherwise mostly the same.

cmonat commented 5 years ago

Hi Zachary,

We finally managed to get to the multi-comparison plot !!! Here is a little picture to show you our beautiful result (thank to you) and to thank you for your really nice tutorial and help with your quick and efficient anwsers by email. [image: image.png] By the way, do you think it would be possible to increase the size of the taxonomical label in the big plot?

Cheers

Cécile & Benoit

Le lun. 15 juil. 2019 à 18:53, Zachary Foster notifications@github.com a écrit :

Thanks for emailing me the data.

The issue is that hmp_otus1 is a list instead of a table. If you convert it to a data.frame with as.data.frame or to a tibble with as_tibble, it will work. Its actually an easy fix since tables in R are a type of list:

is.list(data.frame(x = 1:10))

> [1] TRUE

Created on 2019-07-15 by the reprex package https://reprex.tidyverse.org (v0.3.0)

If you do this, it should work:

load('~/Downloads/hmp_otus1.RData')

library(metacoder)

> Loading required package: taxa

> This is metacoder verison 0.3.2.9002 (development version)

library(tibble)

hmp_otus1 <- as_tibble(hmp_otus1) hmp_otus1

> # A tibble: 1,767 x 37

> otu_id lineage X1 X10 X11 X12 X13 X14 X15 X16 X18

>

> 1 Clust… d__ Ba… 253 184 128 196 134 87 137 89 31

> 2 Clust… d__ Ba… 267 0 2 4 0 0 1 0 0

> 3 Clust… d__ Ba… 300 6 6 6 0 0 0 0 0

> 4 Clust… d__ Ba… 101 82 149 169 190 214 215 171 178

> 5 Clust… d__ Ba… 1157 28 14 17 17 6 1 1 0

> 6 Clust… d__ Ba… 40 10 5 7 2 0 3 3 0

> 7 Clust… d__ Ba… 236 215 446 392 298 223 369 261 426

> 8 Clust… d__ Ba… 39 46 69 92 69 70 98 62 86

> 9 Clust… d__ Ba… 3 0 6 6 5 7 4 7 5

> 10 Clust… d__ Ba… 4 9 5 10 2 6 5 10 1

> # … with 1,757 more rows, and 26 more variables: X19 , X2 ,

> # X20 , X21 , X22 , X23 , X24 , X25 ,

> # X26 , X27 , X28 , X29 , X3 , X30 ,

> # X31 , X32 , X33 , X34 , X35 , X36 ,

> # X4 , X5 , X6 , X7 , X8 , X9

obj1 <- parse_tax_data(hmp_otus1,
                   class_cols = "lineage",

                   class_sep = " ; ",

                   class_regex = "^(.+)__ ?(.+)$",

                   class_key = c(tax_rank = "info", tax_name= "taxon_name"))
obj1

>

> 1147 taxa: aab. Bacteria ... bsd. Multi-affiliation

> 1147 edges: NA->aab, aab->aac ... bcm->bsc, bcn->bsd

> 2 data sets:

> tax_data:

> # A tibble: 1,767 x 38

> taxon_id otu_id lineage X1 X10 X11 X12 X13

>

> 1 bco Clust… d__ Ba… 253 184 128 196 134

> 2 bcp Clust… d__ Ba… 267 0 2 4 0

> 3 bcq Clust… d__ Ba… 300 6 6 6 0

> # … with 1,764 more rows, and 30 more variables:

> # X14 , X15 , X16 , X18 , X19 ,

> # X2 , X20 , X21 , X22 , X23 ,

> # …

> class_data:

> # A tibble: 12,369 x 5

> taxon_id input_index tax_rank tax_name regex_match

>

> 1 aab 1 d Bacteria d__ Bacteria

> 2 aac 1 p Verrucomicr… p__Verrucomic…

> 3 aay 1 c Verrucomicr… c__Verrucomic…

> # … with 1.237e+04 more rows

> 0 functions:

Created on 2019-07-15 by the reprex package https://reprex.tidyverse.org (v0.3.0)

A few things to note:

You dont need to call the variables hmp_otus or obj, its just what I called them. Use names that make sense to you. The table names (e.g. tax_data) are also arbitrary, so you can rename them with names(obj$data) <- c(...)` if you want.

The lineage info has a slightly different format so I modified the class_sep and class_regex some so you don't have spaces around the taxon names.

The column names X# look automatically generated by R. perhaps the CSV you imported from did not have column names? I making unique sample identifiers for column names, even if they are just numbers, so that you can have a sample metadata table like hmp_samples.

I used tibbles in the example above, but you can use plain data.frames if you want. tibbles are a type of data.frame with better printing, but otherwise mostly the same.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder_documentation/issues/6?email_source=notifications&email_token=ABXHGO3MTIQSTEJECYWGYR3P7STQ5A5CNFSM4ICLXHD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6JRZA#issuecomment-511482084, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXHGO4YSS7T7UA7YFQCERDP7STQ5ANCNFSM4ICLXHDQ .

zachary-foster commented 5 years ago

Great, I am glad it worked!

By the way, do you think it would be possible to increase the size of the taxonomical label in the big plot?

Do you mean the legend or the taxon label text size?

cmonat commented 5 years ago

Hi,

I mean the taxon label text size. Thank you

Have a great day Cheers

C.

Le mer. 17 juil. 2019 à 22:14, Zachary Foster notifications@github.com a écrit :

Great, I am glad it worked!

By the way, do you think it would be possible to increase the size of the taxonomical label in the big plot?

Do you mean the legend or the taxon label text size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/metacoder_documentation/issues/6?email_source=notifications&email_token=ABXHGOYREIAXSTNH5ODWVETP754TLA5CNFSM4ICLXHD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2GORGA#issuecomment-512551064, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXHGOY2KR7YLNTI7JN4DR3P754TLANCNFSM4ICLXHDQ .

zachary-foster commented 5 years ago

Ok, yea, you can do that a few ways. One is to force the labels to be a specified size with node_label_size_range = c(0.01, 0.03). See the following for more information:

https://grunwaldlab.github.io/metacoder_documentation/faq.html

grunwaldlab / metacoder_documentation

tax_data seems missing after parse_tax_data step from the tutorial #6

> [1] TRUE

> Loading required package: taxa

> This is metacoder verison 0.3.2.9002 (development version)

> # A tibble: 1,767 x 37

> otu_id lineage X1 X10 X11 X12 X13 X14 X15 X16 X18

>

> 1 Clust… d__ Ba… 253 184 128 196 134 87 137 89 31

> 2 Clust… d__ Ba… 267 0 2 4 0 0 1 0 0

> 3 Clust… d__ Ba… 300 6 6 6 0 0 0 0 0

> 4 Clust… d__ Ba… 101 82 149 169 190 214 215 171 178

> 5 Clust… d__ Ba… 1157 28 14 17 17 6 1 1 0

> 6 Clust… d__ Ba… 40 10 5 7 2 0 3 3 0

> 7 Clust… d__ Ba… 236 215 446 392 298 223 369 261 426

> 8 Clust… d__ Ba… 39 46 69 92 69 70 98 62 86

> 9 Clust… d__ Ba… 3 0 6 6 5 7 4 7 5

> 10 Clust… d__ Ba… 4 9 5 10 2 6 5 10 1

> # … with 1,757 more rows, and 26 more variables: X19 , X2 ,

> # X20 , X21 , X22 , X23 , X24 , X25 ,

> # X26 , X27 , X28 , X29 , X3 , X30 ,

> # X31 , X32 , X33 , X34 , X35 , X36 ,

> # X4 , X5 , X6 , X7 , X8 , X9

>

> 1147 taxa: aab. Bacteria ... bsd. Multi-affiliation

> 1147 edges: NA->aab, aab->aac ... bcm->bsc, bcn->bsd

> 2 data sets:

> tax_data:

> # A tibble: 1,767 x 38

> taxon_id otu_id lineage X1 X10 X11 X12 X13

>

> 1 bco Clust… d__ Ba… 253 184 128 196 134

> 2 bcp Clust… d__ Ba… 267 0 2 4 0

> 3 bcq Clust… d__ Ba… 300 6 6 6 0

> # … with 1,764 more rows, and 30 more variables:

> # X14 , X15 , X16 , X18 , X19 ,

> # X2 , X20 , X21 , X22 , X23 ,

> # …

> class_data:

> # A tibble: 12,369 x 5

> taxon_id input_index tax_rank tax_name regex_match

>

> 1 aab 1 d Bacteria d__ Bacteria

> 2 aac 1 p Verrucomicr… p__Verrucomic…

> 3 aay 1 c Verrucomicr… c__Verrucomic…

> # … with 1.237e+04 more rows

> 0 functions: