ropensci-archive / rorcid

:warning: ARCHIVED :warning: A programmatic interface the Orcid.org API
Other
109 stars 13 forks source link

Institution searches in `rorcid::orcid_search` #75

Closed ciakovx closed 5 years ago

ciakovx commented 5 years ago
Session Info ```r setting value version R version 3.5.1 (2018-07-02) os Windows 10 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United States.1252 ctype English_United States.1252 tz America/Chicago date 2019-07-29 - Packages ------------------------------------------ package * version date lib source anytime 0.3.5 2019-07-28 [1] CRAN (R 3.5.3) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3) backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3) broom 0.5.2 2019-04-07 [1] CRAN (R 3.5.3) callr 3.3.1 2019-07-18 [1] CRAN (R 3.5.3) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.1) cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1) crul 0.8.0 2019-06-28 [1] CRAN (R 3.5.3) curl 4.0 2019-07-22 [1] CRAN (R 3.5.3) data.table 1.12.2 2019-04-07 [1] CRAN (R 3.5.3) desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.2) devtools * 2.1.0 2019-07-06 [1] CRAN (R 3.5.3) digest 0.6.20 2019-07-04 [1] CRAN (R 3.5.3) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.5.3) evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.3) fauxpas 0.2.0 2018-03-01 [1] CRAN (R 3.5.2) forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.5.3) fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3) generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.1) ggplot2 * 3.2.0 2019-06-16 [1] CRAN (R 3.5.3) glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3) haven 2.1.1 2019-07-04 [1] CRAN (R 3.5.3) hms 0.5.0 2019-07-09 [1] CRAN (R 3.5.3) htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1) htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.3) httpcode 0.2.0 2016-11-14 [1] CRAN (R 3.5.2) httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.1) janitor * 1.2.0 2019-04-21 [1] CRAN (R 3.5.3) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.1) kableExtra * 1.1.0 2019-03-16 [1] CRAN (R 3.5.3) knitr * 1.23 2019-05-18 [1] CRAN (R 3.5.3) lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.3) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3) listviewer * 2.1.0 2018-10-07 [1] CRAN (R 3.5.3) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.1) magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.3) modelr 0.1.4 2019-02-18 [1] CRAN (R 3.5.3) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1) nlme 3.1-137 2018-04-07 [1] CRAN (R 3.5.3) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.3) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3) pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3) prettydoc * 0.3.0 2019-07-18 [1] CRAN (R 3.5.3) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1) processx 3.4.1 2019-07-18 [1] CRAN (R 3.5.3) ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.3) purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.5.3) R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.3) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.3) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.5.3) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.5.3) remotes 2.1.0 2019-06-24 [1] CRAN (R 3.5.3) rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.3) rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.5.3) rorcid * 0.5.0 2019-06-07 [1] CRAN (R 3.5.3) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.2) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3) rvest 0.3.4 2019-05-15 [1] CRAN (R 3.5.3) scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3) snakecase 0.11.0 2019-05-25 [1] CRAN (R 3.5.3) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.3) testthat 2.2.1 2019-07-25 [1] CRAN (R 3.5.3) tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.5.3) tidyr * 0.8.3 2019-03-01 [1] CRAN (R 3.5.3) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1) tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.5.1) triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.5.2) urltools 1.7.3 2019-04-14 [1] CRAN (R 3.5.3) usethis * 1.5.1 2019-07-04 [1] CRAN (R 3.5.3) vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.5.3) viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.1) webshot 0.5.1 2018-09-28 [1] CRAN (R 3.5.3) whisker 0.3-2 2013-04-28 [1] CRAN (R 3.5.1) withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1) xfun 0.8 2019-06-25 [1] CRAN (R 3.5.3) xml2 1.2.1 2019-07-29 [1] CRAN (R 3.5.3) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.5.3) ```

I'm getting an error Unknown or uninitialised column: 'orcid-identifier.path'. on all three orcid_search institution parameters:

carberry <- rorcid::orcid_search(family_name = 'carberry', 
                                 current_inst = 'Wesleyan')
iakovakis <- rorcid::orcid_search(family_name = 'iakovakis', 
                                 current_prim_inst = 'Oklahoma State University')
iakovakis2 <- rorcid::orcid_search(family_name = 'iakovakis', 
                                 past_inst = 'Houston')
sckott commented 5 years ago

was just a warning, these queries all result in 0 results. fixed now to give back an empty data.frame

ciakovx commented 5 years ago

Thanks for getting back so quickly. The interesting thing is, when I run these queries, I get data back, do you know why that is?

carberry <- rorcid::orcid(query = 'family-name:carberry AND affiliation-org-name:Wesleyan')
iakovakis <- rorcid::orcid(query = 'family-name:iakovakis AND affiliation-org-name:Oklahoma')
sckott commented 5 years ago

orcid_search() wraps orcid() - but it doesn't allow all possible fields to search on - I just added affiliation_org which maps to affiliation-org-name - so those examples should work now but using affiliation_org instead

you can see what the user facing params map to with rorcid:::field_match_list

ciakovx commented 5 years ago

Ah I see. Thanks Scott, that's great. Excellent work on orcid_search; it's far easier. I'm updating my walkthrough now. Much appreciated!

ciakovx commented 5 years ago

Sorry to bother you again, but it looks like affiliation_org is operating as an OR, not an AND. Each of the below examples should return a single result, but they are returning 10, 3, and 5 respectively.

carberry <- rorcid::orcid_search(family_name = 'carberry', 
                                 affiliation_org = 'Brown')
iakovakis <- rorcid::orcid_search(family_name = 'iakovakis', 
                                 affiliation_org = 'Oklahoma')
boettiger <- rorcid::orcid_search(family_name = 'boettiger',
                                  affiliation_org = 'California')
sckott commented 5 years ago

hmm, it shouldl be AND. i'm getting 1 row for each of those. did you reinstall from master? if you did, then run it with verbose = TRUE at the end to see what URL is being sent, mine looks like https://pub.orcid.org/v3.0/search?q=family-name%3Aboettiger%20AND%20affiliation-org-name%3ACalifornia&rows=10 (which uses AND)

ciakovx commented 5 years ago

Thanks, it works, I needed to restart my session. I really appreciate your responsiveness, thank you.

ciakovx commented 5 years ago

It looks like current_inst, current_prim_inst, and past_inst are not supported--and there is not a way to designate these on one's ORCID profile. See https://github.com/ORCID/ORCID-Source/blob/master/orcid-api-web/tutorial/search.md:

The entire ORCID record is indexed and can be searched using basic keyword searching. Search can also be limited to the specific fields listed below: Affiliations data

  • affiliation-org-name
  • ringgold-org-id
  • grid-org-id
sckott commented 5 years ago

hmmm, that's too bad. I guess we should remove them maybe

ciakovx commented 5 years ago

It is too bad because affiliation-org-name searches across all of an individuals listed affiliations (employment, education, invited positions, membership & service) past and present. So it has the potential to return false positives--in other words, one should not use it to get ORCID iDs of all individuals currently at an institution, because it will include those who previously worked there or got their degree from there.

I would find it helpful to add the ringgold-org-id and grid-org-id whenever you have some time, if it's not too much trouble.

sckott commented 5 years ago

opened an issue for that

ciakovx commented 5 years ago

Thank you. Also I'm getting an error for the keywords argument: Error: Internal Server Error (HTTP 500). - org.apache.solr.client.solrj.impl.HttpSolrServer.RemoteSolrException Full validation error: undefined field keywords

carberry <- rorcid::orcid_search(family_name = 'carberry',
                                 keywords = 'psychoceramics')
sckott commented 5 years ago

opened an issue for that