ropensci / brranching

I can haz all the phylogenies
https://docs.ropensci.org/brranching
Other
18 stars 10 forks source link

Problems with matching species names to Zanne phylogeny when using 'phylomatic' in brranching 0.6.0 #42

Closed JoanaBergmann closed 3 years ago

JoanaBergmann commented 3 years ago

Hi!

I am using the phylomatic function in brranching to prune the Zanne tree for my species set. My code worked well when using brranching 0.5.0 in the beginning of the year. When running it again now with brranching 0.6.0 (sessionInfo below) there are several species from my set that find no match in Zanne anymore:

tree.species.set <- phylomatic(species.set$species_name, get="POST", storedtree="zanne2014")
NOTE: 145 taxa not matched: NA/acacia_crassicarpa/acacia_crassicarpa, NA/acacia_mangium/acacia_mangium, compositae/adenocaulon_himalaicum/adenocaulon_himalaicum, ...

All taxa not matched belong to either compositae or have NA as family. When entering those species online at http://phylodiversity.net/phylomatic/ I realized that by using Asteraceae instead of Compositae or adding the missing family information, the phylomatic function can still match them with Zanne. Seems to make no difference if the genus is provided alone (fabaceae/acacia/acacia_crassicarpa) or as whole species name (fabaceae/acacia_crassicarpa/acacia_crassicarpa) a far as I see.

My first idea was a classic workaround: reinstalling brranching 0.5.0 or 0.4.0. But I got the following Error:

Unknown or uninitialised column: 'this'. Unknown or uninitialised column: 'that'.Error in if (nchar(as.character(dd$that), keepNA = FALSE) == 0) { :argument is of length zero

Any idea how to solve this problem would be highly appreciated! Thanks in advance!!

Session Info ```r R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) Matrix products: default locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2 [7] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0 adephylo_1.1-11 ade4_1.7-15 geiger_2.0.7 [13] caper_1.0.1 mvtnorm_1.1-1 MASS_7.3-51.5 viridis_0.5.1 viridisLite_0.3.0 shape_1.4.4 [19] phytools_0.7-47 maps_3.3.0 ape_5.4-1 vegan_2.5-6 lattice_0.20-38 permute_0.9-5 [25] brranching_0.6.0 lme4_1.1-23 Matrix_1.2-18 loaded via a namespace (and not attached): [1] readxl_1.3.1 uuid_0.1-4 backports_1.1.9 Hmisc_4.4-1 [5] fastmatch_1.1-0 plyr_1.8.6 igraph_1.2.5 lazyeval_0.2.2 [9] sp_1.4-2 splines_3.6.3 rncl_0.8.4 urltools_1.7.3 [13] digest_0.6.25 foreach_1.5.0 htmltools_0.5.0 gdata_2.18.0 [17] fansi_0.4.1 magrittr_1.5 checkmate_2.0.0 cluster_2.1.0 [21] modelr_0.1.8 gmodels_2.18.1 prettyunits_1.1.1 jpeg_0.1-8.1 [25] colorspace_1.4-1 rvest_0.3.6 blob_1.2.1 haven_2.3.1 [29] xfun_0.16 crayon_1.3.4 jsonlite_1.7.1 phylobase_0.8.10 [33] survival_3.1-8 zoo_1.8-8 phangorn_2.5.5 iterators_1.0.12 [37] glue_1.4.2 gtable_0.3.0 seqinr_3.6-1 scales_1.1.1 [41] DBI_1.1.0 Rcpp_1.0.5 plotrix_3.7-8 spData_0.3.8 [45] xtable_1.8-4 progress_1.2.2 htmlTable_2.0.1 units_0.6-7 [49] tmvnsim_1.0-2 spdep_1.1-5 foreign_0.8-75 subplex_1.6 [53] bold_1.1.0 deSolve_1.28 Formula_1.2-3 animation_2.6 [57] htmlwidgets_1.5.1 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1 [61] pkgconfig_2.0.3 reshape_0.8.8 XML_3.99-0.3 dbplyr_1.4.4 [65] deldir_0.1-28 nnet_7.3-12 conditionz_0.1.0 crul_1.0.0 [69] later_1.1.0.1 tidyselect_1.1.0 rlang_0.4.7 reshape2_1.4.4 [73] cellranger_1.1.0 munsell_0.5.0 tools_3.6.3 cli_2.0.2 [77] generics_0.0.2 broom_0.7.0 fastmap_1.0.1 phylocomr_0.3.2 [81] fs_1.5.0 knitr_1.29 nlme_3.1-144 mime_0.9 [85] taxize_0.9.97 adegenet_2.1.3 xml2_1.3.2 compiler_3.6.3 [89] rstudioapi_0.11 curl_4.3 png_0.1-7 e1071_1.7-3 [93] reprex_0.3.0 clusterGeneration_1.3.4 statmod_1.4.34 RNeXML_2.4.5 [97] stringi_1.4.6 classInt_0.4-3 nloptr_1.2.2.2 vctrs_0.3.4 [101] LearnBayes_2.15.1 pillar_1.4.6 lifecycle_0.2.0 triebeard_0.3.0 [105] combinat_0.0-8 data.table_1.13.0 raster_3.3-13 httpuv_1.5.4 [109] R6_2.4.1 latticeExtra_0.6-29 promises_1.1.1 KernSmooth_2.23-16 [113] gridExtra_2.3 codetools_0.2-16 boot_1.3-24 gtools_3.8.2 [117] assertthat_0.2.1 withr_2.2.0 httpcode_0.3.0 mnormt_2.0.2 [121] mgcv_1.8-31 expm_0.999-5 parallel_3.6.3 hms_0.5.3 [125] quadprog_1.5-8 grid_3.6.3 rpart_4.1-15 class_7.3-15 [129] coda_0.19-3 minqa_1.2.4 sf_0.9-5 lubridate_1.7.9 [133] numDeriv_2016.8-1.1 scatterplot3d_0.3-41 shiny_1.5.0 base64enc_0.1-3 [137] tinytex_0.25 ```
sckott commented 3 years ago

Thanks for opening the issue @JoanaBergmann - To try to reproduce the problem I need the species names you used.

JoanaBergmann commented 3 years ago

Dear @sckott - thanks for taking care! The species names from my list that didn´t match are the following:

setdiff(species.set$species_name, tree.species.set$tip.label) [1] "Acacia_crassicarpa" "Acacia_mangium" "Adenocaulon_himalaicum"
[4] "Ageratina_altissima" "Agoseris_glauca" "Ajania_potaninii"
[7] "Ambrosia_psilostachya" "Anaphalis_aureopunctata" "Anaphalis_hancockii"
[10] "Antennaria_neglecta" "Antennaria_rosulata" "Anthemis_tinctoria"
[13] "Anthyllis_terniflora" "Archidendron_clypearia" "Archidendron_lucidum"
[16] "Arnica_fulgens" "Arnica_sororia" "Artemisia_barrelieri"
[19] "Artemisia_cana" "Artemisia_carruthii" "Artemisia_codonocephala"
[22] "Artemisia_halodendron" "Artemisia_igniaria" "Artemisia_mongolica"
[25] "Artemisia_nanschanica" "Artemisia_ordosica" "Aster_lanceolatus"
[28] "Astragalus_aboriginum" "Astragalus_bisulcatus" "Astragalus_dasyglottis"
[31] "Astragalus_galactites" "Astragalus_humistratus" "Astragalus_oxyglottis"
[34] "Astragalus_rusbyi" "Atractylis_humilis" "Bahia_dissecta"
[37] "Bauhinia_brachycarpa" "Bauhinia_championii" "Celmisia_angustifolia"
[40] "Celmisia_lyallii" "Centaurea_pectinata" "Cirsium_acaulon"
[43] "Cirsium_altissimum" "Cirsium_wheeleri" "Clitoria_javitensis"
[46] "Cosmos_parviflorus" "Cytisus_grandiflorus" "Dalea_villosa"
[49] "Desmodium_canadense" "Echinacea_atrorubens" "Erigeron_canadensis"
[52] "Eupatorium_purpureum" "Genista_triacanthos" "Hedysarum_fruticosum"
[55] "Helianthella_quinquenervis" "Helianthus_carnosus" "Helianthus_cusickii"
[58] "Helianthus_debilis" "Helianthus_floridanus" "Helianthus_longifolius"
[61] "Helianthus_neglectus" "Helianthus_niveus" "Helianthus_pauciflorus"
[64] "Helianthus_petiolaris" "Helianthus_praecox" "Heliomeris_multiflora"
[67] "Heliopsis_helianthoides" "Heteropappus_altaicus" "Hieracium_fendleri"
[70] "Hieracium_lepidulum" "Hymenopappus_mexicanus" "Hymenoxys_richardsonii"
[73] "Indigofera_silvestrii" "Indigofera_szechuensis" "Inga_loubryana"
[76] "Inga_rubiginosa" "Inga_stipularis" "Ixeridium_gracile"
[79] "Karelinia_caspia" "Lactuca_ludoviciana" "Lactuca_tatarica"
[82] "Laennecia_schiedeana" "Lathyrus_leucanthus" "Lathyrus_venosus"
[85] "Leontodon_autumnalis" "Leontopodium_nanum" "Leucanthemum_ircutianum"
[88] "Liatris_mucronata" "Lotus_wrightii" "Lupinus_kingii"
[91] "Lysiloma_latisiliquum" "Machaeranthera_canescens" "Machaeranthera_gracilis"
[94] "Medicago_varia" "Melilotus_wolgicus" "Nabalus_tatarinowii"
[97] "Olearia_avicenniifolia" "Oxytropis_hailarensis" "Oxytropis_racemosa"
[100] "Packera_multilobata" "Packera_plattensis" "Pilosella_officinarum"
[103] "Pilosella_peleteriana" "Pithecellobium_lucidum" "Podocarpium_podocarpum"
[106] "Prenanthes_aspera" "Pseudognaphalium_macounii" "Psoralea_bituminosa"
[109] "Pueraria_stricta" "Raoulia_subsericea" "Ratibida_pinnata"
[112] "Rudbeckia_laciniata" "Rudbeckia_serotina" "Saussurea_amurensis"
[115] "Saussurea_mongolica" "Saussurea_nivea" "Scorzonera_divaricata"
[118] "Senecio_actinella" "Senecio_crassulus" "Senecio_eremophilus"
[121] "Senecio_erucifolius" "Senna_marilandica" "Seriphidium_terrae-albae"
[124] "Silphium_integrifolium" "Silphium_laciniatum" "Solidago_decurrens"
[127] "Solidago_mollis" "Solidago_nana" "Stauracanthus_genistoides"
[130] "Styphnolobium_japonicum" "Symphyotrichum_ascendens" "Symphyotrichum_laeve"
[133] "Symphyotrichum_novae-angliae" "Symphyotrichum_oolentangiense" "Taraxacum_campylodes"
[136] "Tephroseris_kirilowii" "Tragopogon_duarius" "Tripleurospermum_perforatum"
[139] "Verbesina_alternifolia" "Vicia_pulchella" "Xanthium_albinum"

sckott commented 3 years ago

with phylomatic function, if you pass in taxonomic names and the parameter taxnames=TRUE is left as is, then we internally use phylomatic_names function to put the names in the proper format of family/genus/genus_epithet. You can set the data source used for phylomatic_names with the db parameter.

It seems that zanne2014 uses the Family name Asteraceae instead of Compositae. So if you change to db="ncbi" you should be good. However, note that with non-apg options for db parameter, those require looking up names on the web, so it will take longer, so best to do it beforehand. Compare these two

phylomatic_names("Adenocaulon himalaicum", db = "apg")
#> [1] "compositae/adenocaulon/adenocaulon_himalaicum"

phylomatic_names("Adenocaulon himalaicum", db = "ncbi")
#> [1] "asteraceae/adenocaulon/adenocaulon_himalaicum"

So probably best to do:

x <- phylomatic_names(your_names, db = "ncbi")
# then pass that to phylomatic
phylomatic(x, taxnames = FALSE, get="POST", storedtree="zanne2014")
JoanaBergmann commented 3 years ago

Dear @sckott - you pointed me towards the right solution with your answer last time - thanks again! Now a different problem arose and I hope you might be able to help me again:

I am using the phylomatic () function in the package ‘branching’ (Version 0.6.0) to form a phylogenetic tree based on the zanne phylogeny for a set of 2405 species (species set attached) with full phylogenetic information from ncbi: treefull <- phylomatic(PCA_NEW$full_species_ncbi, taxnames = FALSE, get="POST", storedtree="zanne2014") When trying to run the function, RStudio keeps running without any result. I then divided the species set into three parts and tried to run phylomatic(). With the separate datasets, everything worked fine. However, there are no synonyms left in the species set that might cause an eternal loop. Any idea how to solve this problem would be highly appreciated! Thank you very much for your support!!

Species_FullPhylo.txt

Session Info: R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] phytools_0.7-47 maps_3.3.0 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.1
[10] tidyverse_1.3.0 taxize_0.9.97 Hmisc_4.4-0 ggplot2_3.3.2 Formula_1.2-3 survival_3.2-7 caper_1.0.1 mvtnorm_1.1-1 MASS_7.3-53
[19] ape_5.4 pairwiseAdonis_0.0.1 cluster_2.1.0 vegan_2.5-6 lattice_0.20-41 permute_0.9-5 shape_1.4.4 brranching_0.6.0 lme4_1.1-23
[28] Matrix_1.2-18

loaded via a namespace (and not attached): [1] minqa_1.2.4 colorspace_1.4-1 phylocomr_0.3.2 ellipsis_0.3.1 htmlTable_1.13.3 fs_1.4.1 base64enc_0.1-3 httpcode_0.3.0
[9] rstudioapi_0.11 lubridate_1.7.9 fansi_0.4.1 xml2_1.3.2 codetools_0.2-16 splines_4.0.3 mnormt_2.0.0 bold_1.1.0
[17] knitr_1.28 jsonlite_1.6.1 nloptr_1.2.2.1 broom_0.5.6 dbplyr_1.4.4 png_0.1-7 httr_1.4.1 compiler_4.0.3
[25] backports_1.1.7 assertthat_0.2.1 cli_2.0.2 acepack_1.4.1 htmltools_0.4.0 tools_4.0.3 igraph_1.2.5 coda_0.19-3
[33] gtable_0.3.0 glue_1.4.1 clusterGeneration_1.3.4 fastmatch_1.1-0 Rcpp_1.0.4.6 cellranger_1.1.0 vctrs_0.3.1 crul_0.9.0
[41] nlme_3.1-149 conditionz_0.1.0 iterators_1.0.12 xfun_0.14 rvest_0.3.5 lifecycle_0.2.0 phangorn_2.5.5 gtools_3.8.2
[49] statmod_1.4.34 zoo_1.8-8 scales_1.1.1 hms_0.5.3 parallel_4.0.3 expm_0.999-4 animation_2.6 RColorBrewer_1.1-2
[57] yaml_2.2.1 curl_4.3 gridExtra_2.3 rpart_4.1-15 reshape_0.8.8 latticeExtra_0.6-29 stringi_1.4.6 foreach_1.5.0
[65] plotrix_3.7-8 checkmate_2.0.0 boot_1.3-25 rlang_0.4.6 pkgconfig_2.0.3 htmlwidgets_1.5.1 tidyselect_1.1.0 plyr_1.8.6
[73] magrittr_1.5 R6_2.4.1 generics_0.0.2 combinat_0.0-8 DBI_1.1.0 haven_2.3.1 pillar_1.4.4 foreign_0.8-80
[81] withr_2.2.0 mgcv_1.8-33 scatterplot3d_0.3-41 nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 uuid_0.1-4 tmvnsim_1.0-2
[89] jpeg_0.1-8.1 readxl_1.3.1 grid_4.0.3 data.table_1.12.8 blob_1.2.1 reprex_0.3.0 digest_0.6.25 numDeriv_2016.8-1.1
[97] munsell_0.5.0 quadprog_1.5-8

sckott commented 3 years ago

So in this code

treefull <- phylomatic(PCA_NEW$full_species_ncbi, taxnames = FALSE, get="POST", storedtree="zanne2014")

If I use the names in the attached text file in place of PCA_NEW$full_species_ncbi I will be replicating your code?

JoanaBergmann commented 3 years ago

Exactly! Thanks for checking!

sckott commented 3 years ago

It gives me an error using phylomatic too. It's not a loop thing. It just take a while for Phylomatic web service to respond, and it returns an error.

Have you tried phylomatic_local()? It works easily for me with your name list.

phylomatic_local(your-name-vector, taxnames = FALSE, storedtree="zanne2014")
JoanaBergmann commented 3 years ago

I tried but it gives me the following error:

treefull <- phylomatic_local(PCA_NEW$full_species_ncbi, taxnames = FALSE, storedtree="zanne2014") preparing names... processing with phylomatic... Error: call to 'phylomatic' failed with status -1073740940

sckott commented 3 years ago

okay. first thing to try it to just try phylomatic_local many times. the C code is buggy, so it fails randomly unfortunately. let me know if it will work any time after many tries

JoanaBergmann commented 3 years ago

I tried 80 times with the same error... any other idea? I am working on windows, could Mac change the outcome? Thanks a lot for investing your time here!

sckott commented 3 years ago

Thanks for testing that out. Sorry about the problem here. It's possible its windows specific. Do you have access to a mac? If so, i'd try that. If it works on mac and easy enough for you to do that's probably a good option.

JoanaBergmann commented 3 years ago

It worked well on a mac! Thanks so much for your help!

sckott commented 3 years ago

Great, sorry it doesn't work better, fixing the C code is not something I can do