Open grabear opened 6 years ago
> length(m$data$diff_table$wilcox_p_value)
[1] 420
> length(m$taxon_ids())
[1] 239
> length(unique(m$data$diff_table$taxon_id))
[1] 239
> m
<Taxmap>
239 taxa: ac. Bacteria, af. Firmicutes ... ff. Anaeroplasmataceae, ky. Anaeroplasma
239 edges: NA->ac, ac->af, af->av, av->bu, bu->dc ... ac->at, at->bs, bs->da, da->ff, ff->ky
6 data sets:
otu_table:
# A tibble: 1,866 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fh 0.000438 0.0000118 0.000435 0.0000257 0.000245 0.0000896 0.0000253 0 0.00000735
2 fi 0.0000438 0.0000589 0.000814 0.000103 0.000768 0.000381 0.000160 0.0000110 0.00000735
3 fi 0.000728 0 0.000860 0.00237 0.000309 0.000179 0.0000422 0 0.0000294
# ... with 1,863 more rows, and 39 more variables: Sample_21 <dbl>, Sample_22 <dbl>,
# Sample_24 <dbl>, Sample_25 <dbl>, Sample_26 <dbl>, Sample_30 <dbl>, Sample_50 <dbl>,
# Sample_59 <dbl>, Sample_60 <dbl>, Sample_61 <dbl>, Sample_7 <dbl>, Sample_27 <dbl>,
# Sample_45 <dbl>, Sample_5 <dbl>, Sample_57 <dbl>, Sample_20 <dbl>, Sample_29 <dbl>,
# Sample_11 <dbl>, Sample_12 <dbl>, Sample_15 <dbl>, Sample_16 <dbl>, Sample_18 <dbl>,
# Sample_19 <dbl>, Sample_23 <dbl>, Sample_28 <dbl>, Sample_31 <dbl>, Sample_35 <dbl>,
# Sample_40 <dbl>, Sample_41 <dbl>, Sample_46 <dbl>, Sample_47 <dbl>, Sample_51 <dbl>,
# Sample_55 <dbl>, Sample_56 <dbl>, Sample_58 <dbl>, Sample_8 <dbl>, Sample_4 <dbl>,
# Sample_52 <dbl>, Sample_36 <dbl>
tax_data:
# A tibble: 1,866 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 fh Bacteria Firmicutes Negativicutes Selenomonadales Veillonellaceae Megamonas uncultur~
2 fi Bacteria Firmicutes Negativicutes Selenomonadales Acidaminococcaceae Phascola~ uncultur~
3 fi Bacteria Firmicutes Negativicutes Selenomonadales Acidaminococcaceae Phascola~ uncultur~
# ... with 1,863 more rows
sam_data:
# A tibble: 48 x 9
sample_ids X.SampleID BarcodeSequence LinkerPrimerSeque~ ForwardFastqFile ReverseFastqFile
<chr> <chr> <chr> <chr> <chr> <chr>
1 Sample_1 Sample_1 <NA> <NA> 33749_S1_L001_R1_00~ 33749_S1_L001_R2_0~
2 Sample_2 Sample_2 <NA> <NA> 33739_S2_L001_R1_00~ 33739_S2_L001_R2_0~
3 Sample_3 Sample_3 <NA> <NA> 33737_S3_L001_R1_00~ 33737_S3_L001_R2_0~
# ... with 45 more rows, and 3 more variables: TreatmentGroup <chr>, SampleName <chr>,
# Description <chr>
phylo_tree:
Phylogenetic tree with 1955 tips and 1954 internal nodes.
Tip labels:
New.CleanUp.ReferenceOTU177, New.ReferenceOTU1091, New.ReferenceOTU2352, EU774211.1.1284, New.ReferenceOTU1302, New.ReferenceOTU239, ...
Node labels:
Root, 0.917, 0.794, 0.853, 0.768, 0.880, ...
Rooted; includes branch lengths.
tax_table:
# A tibble: 420 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ac 0.987 0.994 0.982 0.991 0.983 0.988 0.986 0.991 0.988
2 af 0.473 0.370 0.468 0.397 0.392 0.632 0.436 0.363 0.385
3 ag 0.0370 0.131 0.00445 0.0327 0.00695 0.00538 0.00938 0.00385 0.0235
# ... with 417 more rows, and 39 more variables: Sample_21 <dbl>, Sample_22 <dbl>, Sample_24 <dbl>,
# Sample_25 <dbl>, Sample_26 <dbl>, Sample_30 <dbl>, Sample_50 <dbl>, Sample_59 <dbl>,
# Sample_60 <dbl>, Sample_61 <dbl>, Sample_7 <dbl>, Sample_27 <dbl>, Sample_45 <dbl>,
# Sample_5 <dbl>, Sample_57 <dbl>, Sample_20 <dbl>, Sample_29 <dbl>, Sample_11 <dbl>,
# Sample_12 <dbl>, Sample_15 <dbl>, Sample_16 <dbl>, Sample_18 <dbl>, Sample_19 <dbl>,
# Sample_23 <dbl>, Sample_28 <dbl>, Sample_31 <dbl>, Sample_35 <dbl>, Sample_40 <dbl>,
# Sample_41 <dbl>, Sample_46 <dbl>, Sample_47 <dbl>, Sample_51 <dbl>, Sample_55 <dbl>,
# Sample_56 <dbl>, Sample_58 <dbl>, Sample_8 <dbl>, Sample_4 <dbl>, Sample_52 <dbl>,
# Sample_36 <dbl>
diff_table:
# A tibble: 420 x 11
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ac Stressed Control 0.00272 0.00186 0.00106 0.313
2 af Stressed Control 0.322 0.0850 0.102 0.0000803
3 ag Stressed Control -0.542 -0.00454 -0.00455 0.219
# ... with 417 more rows, and 4 more variables: hartigan_dip_treat1 <dbl>,
# hartigan_dip_treat2 <dbl>, bimodality_coeff_treat1 <dbl>, bimodality_coeff_treat2 <dbl>
0 functions:
> m$data$otu_table$taxon_id
[1] "fh" "fi" "fi" "fi" "fi" "fi" "fj" "de" "fj" "fj" "fl" "fl" "fl" "fl" "fl" "fm" "fm" "fm" "fm" "fm" "fm"
[22] "fj" "de" "fn" "df" "fp" "fp" "fp" "fq" "fq" "fr" "fn" "fr" "fn" "fs" "df" "ft" "ft" "ft" "fl" "fl" "fl"
[43] "fl" "fl" "ft" "ft" "de" "de" "fj" "fu" "fu" "fv" "de" "de" "fx" "fx" "de" "fx" "fy" "fy" "fj" "fj" "fv"
[64] "fv" "fm" "fs" "fs" "fs" "fz" "de" "ga" "fm" "fm" "fm" "fm" "fs" "gb" "gc" "de" "gd" "gd" "fm" "ge" "gd"
[85] "gd" "gd" "de" "gf" "gf" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "ge" "gd" "gg" "gg"
[106] "gg" "gg" "gh" "gg" "gg" "gg" "fu" "fu" "fu" "gi" "de" "fs" "fs" "fs" "fs" "de" "fs" "fs" "fm" "de" "fs"
[127] "fs" "gj" "fs" "fs" "fs" "gk" "gl" "gl" "fv" "gl" "de" "gm" "de" "gn" "gl" "fm" "de" "go" "gp" "de" "af"
[148] "dc" "de" "gs" "gt" "gs" "gs" "gs" "gs" "gs" "gu" "gs" "gs" "gs" "gt" "gt" "gt" "gt" "gs" "gs" "gs" "gs"
[169] "gs" "gb" "gb" "gt" "gt" "gt" "gs" "gs" "gv" "gv" "gw" "gw" "gu" "gu" "de" "gx" "gx" "gy" "gy" "bv" "ha"
[190] "gx" "bv" "hb" "hc" "hd" "fj" "af" "af" "hb" "he" "he" "he" "he" "hf" "hg" "hg" "hg" "hg" "hh" "hi" "hi"
[211] "hi" "hj" "dr" "hl" "hm" "oh" "hm" "hm" "oj" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "cf" "cf" "bd" "hs"
[232] "hs" "hs" "hs" "hs" "hs" "hs" "hs" "ht" "hu" "hv" "hv" "hv" "hw" "ea" "ea" "hy" "hz" "df" "ib" "ib" "ic"
[253] "ic" "ic" "ic" "ic" "ic" "ic" "ic" "ic" "ic" "ic" "ic" "ic" "df" "id" "ie" "ie" "id" "fn" "fn" "if" "de"
[274] "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "ig" "de" "ht" "ht" "ih"
[295] "ii" "gm" "gm" "gm" "gm" "gm" "de" "go" "fj" "gm" "gm" "ik" "ik" "gm" "gj" "gm" "gm" "il" "de" "gm" "de"
[316] "gm" "ik" "de" "gm" "gm" "gm" "ht" "gl" "de" "de" "de" "im" "de" "im" "im" "fm" "fm" "ga" "ga" "de" "go"
[337] "de" "de" "fj" "in" "in" "in" "io" "io" "io" "io" "io" "io" "io" "io" "io" "io" "io" "io" "io" "io" "in"
[358] "io" "in" "io" "io" "io" "df" "in" "in" "in" "in" "in" "in" "in" "in" "io" "io" "in" "in" "in" "in" "in"
[379] "in" "in" "io" "io" "io" "io" "io" "io" "io" "io" "io" "df" "io" "io" "io" "io" "df" "df" "df" "ip" "ip"
[400] "iq" "ir" "is" "is" "fj" "gm" "de" "ge" "ed" "ed" "ed" "ee" "iv" "ee" "iv" "iv" "iv" "ee" "ee" "iv" "ix"
[421] "ix" "ix" "ix" "iy" "ho" "ho" "ho" "iz" "ja" "ja" "de" "fy" "fy" "gm" "de" "de" "gm" "gm" "gm" "gj" "gm"
[442] "gm" "gm" "gm" "gm" "gm" "gm" "gm" "de" "gm" "gm" "gj" "de" "gm" "de" "gm" "de" "gj" "gj" "gj" "gj" "gj"
[463] "gj" "gj" "gm" "gj" "gj" "fj" "fj" "fj" "fj" "fj" "fj" "fj" "de" "gp" "gp" "fp" "fp" "fp" "fp" "fp" "fp"
[484] "fp" "fp" "fp" "fp" "fp" "fp" "fp" "fp" "fp" "fp" "de" "de" "fj" "de" "de" "fn" "fn" "fr" "bv" "fn" "fn"
[505] "io" "fi" "fi" "af" "fi" "fi" "fi" "fi" "fi" "fi" "fi" "fi" "fi" "fi" "af" "af" "af" "af" "af" "jb" "fi"
[526] "fr" "gb" "gb" "gb" "ge" "df" "df" "df" "fq" "fq" "df" "fq" "df" "ac" "fn" "fn" "fn" "fn" "fn" "fn" "fn"
[547] "df" "df" "bv" "bv" "ig" "df" "fs" "jd" "jd" "fr" "fr" "if" "if" "if" "bv" "je" "fr" "fr" "de" "jf" "jf"
[568] "jf" "df" "qw" "jh" "jh" "ei" "iv" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho"
[589] "ho" "ho" "ho" "ho" "ho" "ho" "ho" "ho" "jj" "jj" "rd" "jj" "gu" "gu" "gu" "gu" "gu" "gu" "gu" "gu" "jj"
[610] "jj" "jj" "jj" "jj" "jj" "jj" "jj" "re" "re" "re" "re" "rd" "rd" "jj" "jj" "jk" "hv" "hv" "hv" "hv" "bd"
[631] "bd" "cq" "cq" "cq" "cq" "cq" "cq" "cq" "cq" "jo" "jo" "ib" "ib" "ib" "ib" "hb" "hb" "jd" "jp" "de" "df"
[652] "fr" "fr" "fq" "df" "df" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr"
[673] "fr" "jd" "jh" "jh" "jh" "ho" "de" "fj" "fr" "ge" "fj" "jd" "jp" "fj" "fx" "ee" "jq" "jq" "ee" "hs" "hs"
[694] "hs" "hs" "iv" "iv" "iv" "iv" "jr" "ho" "rq" "ix" "iv" "iv" "ee" "hs" "hs" "hs" "hs" "hs" "hs" "hs" "hs"
[715] "hs" "hs" "hs" "hs" "hs" "hs" "hs" "hs" "hs" "hs" "ee" "eq" "eq" "eq" "eq" "eq" "eq" "jj" "iv" "ch" "iv"
[736] "iv" "iv" "iv" "iv" "iv" "iv" "iv" "iv" "iv" "iv" "iv" "iv" "jv" "iv" "iv" "iv" "iv" "iv" "dy" "dy" "iv"
[757] "jv" "dy" "iv" "iv" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "dy" "iv" "iv"
[778] "iv" "dy" "iv" "ee" "hz" "ib" "de" "fl" "es" "jj" "jj" "jj" "jj" "jj" "jj" "rd" "jj" "jj" "jj" "jj" "jj"
[799] "jj" "jj" "jj" "jj" "jj" "jj" "jj" "jj" "jj" "jj" "jj" "jj" "af" "af" "fx" "hc" "jz" "jz" "jz" "es" "hb"
[820] "hb" "hb" "es" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "hb" "es" "es" "es" "kb" "kb"
[841] "eu" "jo" "jo" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je"
[862] "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je"
[883] "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "je" "fp" "hz" "hz" "fq" "fr"
[904] "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr"
[925] "fr" "fr" "fr" "fr" "df" "ja" "ja" "ja" "ja" "ja" "ja" "ja" "ja" "ib" "ib" "ib" "ib" "ib" "ib" "ib" "ib"
[946] "ib" "ib" "ib" "kd" "fr" "fr" "fr" "fr" "fr" "df" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "fr" "ke" "fp" "fp"
[967] "fq" "fq" "fr" "fr" "fr" "fr" "fr" "df" "fp" "df" "kd" "df" "df" "df" "fr" "fp" "fp" "fq" "fq" "fr" "fq"
[988] "fq" "fq" "fq" "fq" "fr" "fr" "fq" "fq" "fq" "fq" "fq" "fq" "fq"
[ reached getOption("max.print") -- omitted 866 entries ]
> unique(m$data$otu_table$taxon_id)
[1] "fh" "fi" "fj" "de" "fl" "fm" "fn" "df" "fp" "fq" "fr" "fs" "ft" "fu" "fv" "fx" "fy" "fz" "ga" "gb" "gc"
[22] "gd" "ge" "gf" "gg" "gh" "gi" "gj" "gk" "gl" "gm" "gn" "go" "gp" "af" "dc" "gs" "gt" "gu" "gv" "gw" "gx"
[43] "gy" "bv" "ha" "hb" "hc" "hd" "he" "hf" "hg" "hh" "hi" "hj" "dr" "hl" "hm" "oh" "oj" "ho" "cf" "bd" "hs"
[64] "ht" "hu" "hv" "hw" "ea" "hy" "hz" "ib" "ic" "id" "ie" "if" "ig" "ih" "ii" "ik" "il" "im" "in" "io" "ip"
[85] "iq" "ir" "is" "ed" "ee" "iv" "ix" "iy" "iz" "ja" "jb" "ac" "jd" "je" "jf" "qw" "jh" "ei" "jj" "rd" "re"
[106] "jk" "cq" "jo" "jp" "jq" "jr" "rq" "eq" "ch" "jv" "dy" "es" "jz" "kb" "eu" "kd" "ke" "kf" "kg" "kh" "ki"
[127] "ep" "kl" "km" "kn" "kp" "kq" "tc" "kr" "te" "ks" "kt" "ku" "kv" "kx" "ky" "kz" "ec" "lb" "lc" "ld" "le"
[148] "lf" "lg" "lh"
> m$data$tax_table$taxon_id
[1] "ac" "af" "ag" "ah" "ai" "aj" "ak" "al" "an" "ao" "ap" "aq" "as" "at" "av" "aw" "ay" "az" "ba" "bb" "bc"
[22] "bd" "be" "bf" "bg" "bh" "bi" "bk" "bl" "bm" "bn" "bo" "bp" "br" "bs" "bu" "bv" "bx" "by" "bz" "ca" "cb"
[43] "cc" "cd" "ce" "cf" "bd" "ch" "ci" "cj" "ck" "cl" "cn" "co" "bd" "cq" "cr" "cs" "ct" "cu" "cv" "cw" "cx"
[64] "cz" "da" "dc" "dd" "de" "df" "dh" "di" "dj" "dl" "dm" "dn" "do" "dp" "dq" "dr" "ds" "dt" "cf" "bd" "dx"
[85] "dy" "dz" "ea" "eb" "ec" "ed" "ee" "ef" "eh" "ei" "ej" "ek" "bd" "cq" "eo" "ep" "eq" "ch" "es" "et" "eu"
[106] "ew" "ex" "ey" "ez" "fa" "fb" "fc" "fe" "ff" "fh" "fi" "fj" "de" "fl" "fm" "fn" "fp" "fq" "fr" "fs" "ft"
[127] "fu" "fv" "fx" "fy" "fz" "ga" "gb" "gc" "gd" "ge" "gf" "gg" "gh" "gi" "gj" "gk" "gl" "gm" "gn" "go" "gp"
[148] "gs" "gt" "gu" "gv" "gw" "gx" "gy" "ha" "hb" "hc" "hd" "he" "hf" "hg" "hh" "hi" "hj" "hl" "hm" "hn" "ho"
[169] "cf" "bd" "hs" "ht" "hu" "hv" "hw" "ea" "hy" "hz" "df" "ib" "ic" "id" "ie" "if" "ig" "ih" "ii" "de" "ik"
[190] "il" "im" "in" "io" "ip" "iq" "ir" "is" "ed" "ee" "iv" "ee" "ix" "iy" "iz" "ja" "jb" "jd" "je" "jf" "jg"
[211] "jh" "ei" "jj" "jk" "bd" "cq" "jo" "jp" "jq" "jr" "js" "eq" "ch" "jv" "dy" "es" "jz" "kb" "eu" "kd" "ke"
[232] "kf" "kg" "kh" "ki" "ep" "kl" "km" "kn" "dc" "kp" "kq" "kr" "ks" "kt" "ku" "kv" "kx" "ky" "kz" "ec" "lb"
[253] "lc" "ld" "le" "lf" "lg" "lh" "fh" "fi" "fj" "de" "fj" "fl" "fm" "de" "fn" "fp" "fq" "fr" "ft" "fu" "fv"
[274] "fx" "fy" "fy" "fs" "fz" "ga" "fs" "gd" "gf" "ge" "gg" "gh" "gi" "gj" "gl" "gl" "gm" "gn" "go" "gp" "gs"
[295] "gt" "gu" "gb" "gv" "gw" "gx" "gy" "ha" "hb" "hc" "hd" "he" "hg" "hh" "hi" "hl" "oh" "hm" "oj" "ho" "ho"
[316] "cf" "bd" "hs" "ht" "hu" "hv" "ea" "hy" "hz" "df" "ib" "ic" "id" "ie" "if" "ig" "ht" "ih" "ii" "de" "ik"
[337] "gm" "im" "in" "io" "io" "ip" "iq" "is" "ed" "ee" "iv" "ee" "ix" "iy" "iz" "gj" "jb" "gb" "jd" "je" "jf"
[358] "qw" "jh" "ei" "ho" "jj" "rd" "re" "jk" "bd" "cq" "jo" "jp" "fr" "fx" "jq" "jr" "rq" "hs" "eq" "ch" "jv"
[379] "dy" "es" "hc" "jz" "kb" "eu" "ja" "ja" "kd" "ke" "fq" "kf" "kg" "kh" "ki" "ep" "kl" "km" "kn" "kp" "kq"
[400] "tc" "te" "kt" "ku" "kv" "kv" "kx" "ky" "ky" "ec" "lb" "lc" "io" "ld" "jv" "le" "lf" "lf" "lg" "lh" "ii"
> unique(m$data$tax_table$taxon_id)
[1] "ac" "af" "ag" "ah" "ai" "aj" "ak" "al" "an" "ao" "ap" "aq" "as" "at" "av" "aw" "ay" "az" "ba" "bb" "bc"
[22] "bd" "be" "bf" "bg" "bh" "bi" "bk" "bl" "bm" "bn" "bo" "bp" "br" "bs" "bu" "bv" "bx" "by" "bz" "ca" "cb"
[43] "cc" "cd" "ce" "cf" "ch" "ci" "cj" "ck" "cl" "cn" "co" "cq" "cr" "cs" "ct" "cu" "cv" "cw" "cx" "cz" "da"
[64] "dc" "dd" "de" "df" "dh" "di" "dj" "dl" "dm" "dn" "do" "dp" "dq" "dr" "ds" "dt" "dx" "dy" "dz" "ea" "eb"
[85] "ec" "ed" "ee" "ef" "eh" "ei" "ej" "ek" "eo" "ep" "eq" "es" "et" "eu" "ew" "ex" "ey" "ez" "fa" "fb" "fc"
[106] "fe" "ff" "fh" "fi" "fj" "fl" "fm" "fn" "fp" "fq" "fr" "fs" "ft" "fu" "fv" "fx" "fy" "fz" "ga" "gb" "gc"
[127] "gd" "ge" "gf" "gg" "gh" "gi" "gj" "gk" "gl" "gm" "gn" "go" "gp" "gs" "gt" "gu" "gv" "gw" "gx" "gy" "ha"
[148] "hb" "hc" "hd" "he" "hf" "hg" "hh" "hi" "hj" "hl" "hm" "hn" "ho" "hs" "ht" "hu" "hv" "hw" "hy" "hz" "ib"
[169] "ic" "id" "ie" "if" "ig" "ih" "ii" "ik" "il" "im" "in" "io" "ip" "iq" "ir" "is" "iv" "ix" "iy" "iz" "ja"
[190] "jb" "jd" "je" "jf" "jg" "jh" "jj" "jk" "jo" "jp" "jq" "jr" "js" "jv" "jz" "kb" "kd" "ke" "kf" "kg" "kh"
[211] "ki" "kl" "km" "kn" "kp" "kq" "kr" "ks" "kt" "ku" "kv" "kx" "ky" "kz" "lb" "lc" "ld" "le" "lf" "lg" "lh"
[232] "oh" "oj" "qw" "rd" "re" "rq" "tc" "te"
The taxon_ids that are present in the otu_table/tax_data tables are different, but related to the taxon_ids in the tax_table/diff_tables created by coercing the phyloseq object into a taxmap object as shown here.
I am not sure I understand. All of the taxon IDs should come from the same set, the result of m$taxon_ids()
. If there are taxon IDs (besides NA
) that are not in this set, then that is either a bug or the result of filtering using something besides filter_*
. The otu_table/tax_data tables will usually only contain "leaf" taxa (e.g. species or genus) since that is what OTUs are usually assigned to, whereas the tax_table/diff_tables can have all taxa, including intermediates, so that could be why they are different subsets of a common set of IDs.
While the different taxon_ids are obviously just part of the taxmap heirarchy, the confusion could be remedied by including the OTU label for taxon_ids mentioned in #219 that are legitimately identified/annotated organisms. While intermediate taxon_ids that are used to identify/accumulate data for each of the intermediate taxonomic ranks, are left without this OTU label.
This is a good idea for specific dataset where the " legitimately identified/annotated organisms" are known, but I am not sure how this could be abstracted to data of unknown characteristics. What about when multiple OTUs match a single taxon? If you know that there is one OTU per taxon, you can do something like the following:
> library(metacoder)
> print(ex_taxmap)
<Taxmap>
17 taxa: b. Mammalia, c. Plantae, d. Felidae, e. Notoryctidae, f. Hominidae ... o. typhlops, p. sapiens, q. lycopersicum, r. tuberosum
17 edges: NA->b, NA->c, b->d, b->e, b->f, c->g, d->h, d->i, e->j, f->k, g->l, h->m, i->n, j->o, k->p, l->q, l->r
4 data sets:
info:
# A tibble: 6 x 4
taxon_id name n_legs dangerous
<chr> <chr> <dbl> <lgl>
1 m tiger 4.00 T
2 n cat 4.00 F
3 o mole 4.00 F
# ... with 3 more rows
phylopic_ids: a named vector of 'character' with 6 items
m. e148eabb-f138-43c6-b1e4-5cda2180485a, n. 12899ba0-9923-4feb-a7f9-758c3c7d5e13 ... r. 63604565-0406-460b-8cb8-1abe954b3f3a
foods: a list of 6 items named by taxa:
m, n, o, p, q, r
abund:
# A tibble: 8 x 5
taxon_id code sample_id count taxon_index
<chr> <fct> <fct> <dbl> <int>
1 m T A 1.00 1
2 n C A 2.00 2
3 o M B 5.00 3
# ... with 5 more rows
1 functions:
reaction
> (new_id_key <- ex_taxmap$map_data(taxon_ids, name))
b c d e f g h i j k l m n o p q
NA NA NA NA NA NA NA NA NA NA NA "tiger" "cat" "mole" "human" "tomato"
r
"potato"
> (new_ids <- ifelse(is.na(new_id_key), names(new_id_key), new_id_key))
b c d e f g h i j k l m n o p q
"b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "tiger" "cat" "mole" "human" "tomato"
r
"potato"
> ex_taxmap$replace_taxon_ids(new_ids = new_ids)
<Taxmap>
17 taxa: b. Mammalia, c. Plantae, d. Felidae, e. Notoryctidae ... mole. typhlops, human. sapiens, tomato. lycopersicum, potato. tuberosum
17 edges: NA->b, NA->c, b->d, b->e, b->f, c->g, d->h, d->i, e->j, f->k, g->l, h->tiger, i->cat, j->mole, k->human, l->tomato, l->potato
4 data sets:
info:
# A tibble: 6 x 4
taxon_id name n_legs dangerous
<chr> <chr> <dbl> <lgl>
1 tiger tiger 4.00 T
2 cat cat 4.00 F
3 mole mole 4.00 F
# ... with 3 more rows
phylopic_ids: a named vector of 'character' with 6 items
tiger. e148eabb-f138-43c6-b1e4-5cda2180485a ... potato. 63604565-0406-460b-8cb8-1abe954b3f3a
foods: a list of 6 items named by taxa:
tiger, cat, mole, human, tomato, potato
abund:
# A tibble: 8 x 5
taxon_id code sample_id count taxon_index
<chr> <fct> <fct> <dbl> <int>
1 tiger T A 1.00 1
2 cat C A 2.00 2
3 mole M B 5.00 3
# ... with 5 more rows
1 functions:
reaction
Another thing I often do is add OTU ids as a "rank" in the taxonomy. So there is an OTU "taxon" below species, or whatever it is assigned to. This way OTUs show up as nodes in the heat_trees. This is pretty easy to do when parsing your data from tables. Just add the OTU id column name to the class_cols
option of parse_tax_data
. Would this do what you want? I could easily add that as a T/F option to the phyloseq parser.
The intermediate taxon_ids could also be given a group OTU label that gives access to the list of OTU labels that are included in that taxonomy.
That information can be gotten for any table using the obs
function with the value
option. EG:
> obs(ex_taxmap, "info", value = "name")
$b
m n o p
"tiger" "cat" "mole" "human"
$c
q r
"tomato" "potato"
$d
m n
"tiger" "cat"
$e
o
"mole"
$f
p
"human"
$g
q r
"tomato" "potato"
$h
m
"tiger"
$i
n
"cat"
$j
o
"mole"
$k
p
"human"
$l
q r
"tomato" "potato"
$m
m
"tiger"
$n
n
"cat"
$o
o
"mole"
$p
p
"human"
$q
q
"tomato"
$r
r
"potato"
This would also be another way to find new taxon IDs if you wanted to. Just replace the ids for any taxon with one OTU with the otu's ID.
Thanks for the thoughts!
The taxon_ids that are present in the otu_table/tax_data tables are different, but related to the taxon_ids in the tax_table/diff_tables created by coercing the phyloseq object into a taxmap object as shown here.
While the different taxon_ids are obviously just part of the taxmap heirarchy, the confusion could be remedied by including the OTU label for taxon_ids mentioned in #219 that are legitimately identified/annotated organisms. While intermediate taxon_ids that are used to identify/accumulate data for each of the intermediate taxonomic ranks, are left without this OTU label.
The intermediate taxon_ids could also be given a group OTU label that gives access to the list of OTU labels that are included in that taxonomy.