Closed Adafede closed 4 years ago
thanks for the issue @Adafede
in your use case, do you have the ability to split up the components of the citation, eg., to authors, title, volume, issue, year, etc. ?
It is precisely my problem. I work with heterogeneous data. Some data is insanely clean and each field atomic, other one contains everything mixed up with no characterizable splitter to allow splitting of the data. (the example I posted is fat from being the worst one)
If I can split, I do but sometimes I sadly can't confidently.
Have you tried field queries?
I think this may work better:
x="Luesch, Hendrik; Yoshida, Wesley Y.; Moore, Richard E.; Paul, Valerie J.; Journal of Natural Products; vol. 63; 10; (2000); p. 1437 - 1439."
z <- cr_works(flq = c(query.bibliographic = x), sort = 'score', order = "desc")
z$data$title[1:2]
#> [1] "Isolation and Structure of the Cytotoxin Lyngbyabellin B and Absolute Configuration of Lyngbyapeptin A from the Marine CyanobacteriumLyngbya majuscula"
#> [2] "Apramides A−G, Novel Lipopeptides from the Marine CyanobacteriumLyngbya majuscula"
I had a quick look at it thinking it could indeed be a good option but did not test them yet!
I'll try with your suggestion and come back to you.
Thank you very much :)
Hi, coming back to you again, it works indeed way better!
Thank you very much!
great, glad it works!
Session Info
```r R version 4.0.0 (2020-04-24) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] fr_CH.UTF-8/fr_CH.UTF-8/fr_CH.UTF-8/C/fr_CH.UTF-8/fr_CH.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] zoo_1.8-8 XML_3.99-0.3 webchem_1.0.0 UpSetR_1.4.0 forcats_0.5.0 [6] tidyr_1.1.0 tibble_3.0.1 tidyverse_1.3.0 taxize_0.9.96 stringr_1.4.0 [11] stringi_1.4.6 splitstackshape_1.4.8 rvest_0.3.5 xml2_1.3.2 reticulate_1.16 [16] rentrez_1.2.2 readxl_1.3.1 readr_1.3.1 rcrossref_1.0.0 RColorBrewer_1.1-2 [21] purrr_0.3.4 pbmcapply_1.5.0 jsonlite_1.6.1 igraph_1.2.5 ggraph_2.0.3 [26] eulerr_6.1.0 dplyr_1.0.0 digest_0.6.25 data.table_1.12.8 collapsibleTree_0.1.7 [31] chorddiag_0.1.2 ChemmineR_3.40.0 plotly_4.9.2.1 Hmisc_4.4-0 ggplot2_3.3.1 [36] Formula_1.2-3 survival_3.1-12 lattice_0.20-41 loaded via a namespace (and not attached): [1] colorspace_1.4-1 rjson_0.2.20 ellipsis_0.3.1 htmlTable_1.13.3 fs_1.4.1 base64enc_0.1-3 [7] httpcode_0.3.0 rstudioapi_0.11 farver_2.0.3 urltools_1.7.3 graphlayouts_0.7.0 ggrepel_0.8.2 [13] DT_0.13 lubridate_1.7.8 fansi_0.4.1 codetools_0.2-16 splines_4.0.0 bold_1.0.0 [19] knitr_1.28 polyclip_1.10-0 broom_0.5.6 dbplyr_1.4.4 cluster_2.1.0 png_0.1-7 [25] ggforce_0.3.1 shiny_1.4.0.2 data.tree_0.7.11 compiler_4.0.0 httr_1.4.1 backports_1.1.7 [31] assertthat_0.2.1 Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2 cli_2.0.2 later_1.1.0.1 [37] tweenr_1.0.1 acepack_1.4.1 htmltools_0.4.0 tools_4.0.0 gtable_0.3.0 glue_1.4.1 [43] rsvg_2.1 tinytex_0.23 Rcpp_1.0.4.6 cellranger_1.1.0 vctrs_0.3.1 crul_0.9.0 [49] ape_5.4 nlme_3.1-148 iterators_1.0.12 xfun_0.14 mime_0.9 miniUI_0.1.1.1 [55] lifecycle_0.2.0 MASS_7.3-51.6 scales_1.1.1 tidygraph_1.2.0 hms_0.5.3 promises_1.1.0 [61] curl_4.3 gridExtra_2.3 triebeard_0.3.0 rpart_4.1-15 reshape_0.8.8 latticeExtra_0.6-29 [67] foreach_1.5.0 checkmate_2.0.0 bibtex_0.4.2.2 rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6 [73] htmlwidgets_1.5.1 tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2 [79] DBI_1.1.0 haven_2.3.1 pillar_1.4.4 foreign_0.8-80 withr_2.2.0 RCurl_1.98-1.2 [85] nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 viridis_0.5.1 jpeg_0.1-8.1 grid_4.0.0 [91] blob_1.2.1 reprex_0.3.0 xtable_1.8-4 httpuv_1.5.4 munsell_0.5.0 viridisLite_0.3.0 ```Hi,
Another issue, or question this time:
Let's take as example the following entry (X):
After running ref retrieval (
cr_works(query = X, sort = 'score', order = "desc")
) at rank 1, I obtainwith a score of 72.79652 . This result is WRONG
When giving only (Y):
(
cr_works(query = Y, sort = 'score', order = "desc")
) at rank 1, I obtainwith a score of 26.922556 . This result is CORRECT
Conclusion: more information with higher score leads to wrong result and less information with lower score leads to correct result (voluntarily expressed so...)
How would you judge this? Is there any option I missed to maybe help with those kind of problems?
Many thanks