ropensci-archive / rorcid

:warning: ARCHIVED :warning: A programmatic interface the Orcid.org API
Other
109 stars 13 forks source link

Add support to extract BibTeX entries direcly from ORCID (if available) #74

Closed RLumSK closed 5 years ago

RLumSK commented 5 years ago

Summary

The function rorcid::orcid_citations()attempts to get BibTeX records from Crossref. However, if this fails (usually for entries without DOI), only the record from ORCID is returned. However, this record may have already a BibTeX entry. It just needs to get extracted. My commit attempts to implement support to extract those records until the ORCID API provides such functionality.

Description

The approach may appear a little bit heavy, but I tested nearly 100 different records and the way finally chosen, reflects and addresses various difficulties I encountered.

Related Issue

I am not aware of related issues, I do hope I did not overlook something. The only comment I did find was in the manual itself, where it is written:

Right now we get JSON citations back. We'd like to support bibtex format. DOI.org supports this but not ORCID.

Example

The following call will extract all publications from my ORCID profile, which include publications without DOI. In the 'old' version 'only' the standard ORCID record was returned.

pubs <- rorcid::orcid_citations("0000-0002-0734-2199")

Tests

The file 'orcid_citations.R' is not yet covered by tests, so I did not add a test, but of course I can support writting a proper test for the function.

sckott commented 5 years ago

thanks @RLumSK - I'll have a look and get back to you. In the meantime, there's a lot of whitespace changes in this PR, minimizing those would be great - send another commit removing them if possible

RLumSK commented 5 years ago

Thanks @sckott for your brief response. I tried to understand the 'there's a lot of whitespace changes in this PR', but I am not sure if I got you. Obviously I did not touch the whitespace, since the lines are the same, and of course I did not manually removed them. I used RStudio and the git GUI therein. Do you have any idea how to tell RStudio not to touch the whitespace (I did not use another editor)?

To test, tried the changes again (hard reset and then changing the lines in the code only), the problem is that on my computer I do not see those whitespace changes (at least not in RStudio). Of course I see them now on GitHub. I would like to minimise this, but for them moment it is not entirely clear to me how it was caused. If you have any advice, I am happy to learn.

sckott commented 5 years ago

Okay, if it's not easy I can just fix that later. no worries

RLumSK commented 5 years ago

@sckott Unbelivable, I found an option 'Strip trailing horizontal whitespace when saving' in the RStudio settings (and it was checked). I had no idea that this option exists at all. I am sorry for the trouble. I will make a new commit with all this whitespace left where it is.

RLumSK commented 5 years ago

Ok, sorry. I have no idea why it is not working, the option is switched off in RStudio, but it still strips the whitespace. For the tries above I reverted my commits and then I only copied & pasted the modified code lines; nothing more. Probably something else is also working in the back not liking the whitespace.

sckott commented 5 years ago

no worries, i'll take a look

RLumSK commented 5 years ago

@sckott

maybe i'm missing something, but couldn't you do jsonlite::fromJSON(cite_put(orcid, pc, ctype = "application/vnd.orcid+json; qs=4")), which gives a list, then you can index to the citaiton like x$citation$citation-value??

You are right with your suggestion to use jsonlite::fromJSON(). I was so happy with parsing the string using regular expressions that it did not cross my mind. But you are right, given that you already use the package, this solution is much better and cleaner. So I modified the function another time.

I also added the tests you have requested. I hope the way I implemented them follows your current package style in an accetable manner. Of coure, I will modify them again if you are unhappy.

sckott commented 5 years ago

thanks, having another look

sckott commented 5 years ago

You can't link in the docs to a method/fxn that doesn't have a manual entry/file, because that attempts to create a link to a manual file, and you shouldn't link to files that don't exist - so the [cite_put] in extract_bibtex can just be wrapped in backticks


on running the test in test-orcid_citatitons.R: i get a warning, from data.table::rbindlist

orcid_citations(orcid = "0000-0002-0734-2199", put_code = "40038622")
#> # A tibble: 1 x 5
#>        put ids   type  format citation
#>      <int> <lgl> <lgl> <chr>  <chr>
#> 1 40038622 NA    NA    bibtex "@article{Kreutzer_2012coy,\n\tauthor = {Kreutzer, Sebastian and Schmidt, Christoph and Fuchs, Margret C…
#> Warning message:
#> In data.table::rbindlist(x, use.names = TRUE, fill = TRUE) :
#>   Column 2 ['ids'] of item 1 is length 0. This (and 1 other like it) has been filled with NA (NULL #> for list columns) to make each item uniform.

which I think comes from the as_dt() function call. This makes me wonder if theres's some parsing in your new function that could be fixed, or If we just need to wrap data.table::rbindlist in suppressWarnings


tests:

RLumSK commented 5 years ago

Some response to ask you for advice before I'll make a new PR:

You can't link in the docs to a method/fxn

I did use @noRd which means that 'roxygen2' does nothing and does not link. Otherwise, you would have seen an error message while checking the package. My idea was: If you decide to export this function one day, you have already a brief but working inline documentation. Should I keep it or change it as you had suggested it?

on running the test in test-orcid_citatitons.R: i get a warning, from data.table::rbindlist

I saw the warning, but it makes sense, and I did not consider it a bug but a logical consequence. If I am not overlooking something, the ids column is usually filled with the DOI. However, if no DOI exists, the field remains empty. Or do you want to fill me this with some other id?

tests:

D'accord.

sckott commented 5 years ago

i realize you used noRd, but when you run R CMD check or equivalent with other tools there is a warning about that syntax. So whether or not the noRd is there, it should be changed.

The data.table warning is okay, but I think many users will think there is something wrong, so i think best if we wrap the data.table::rbindlist call in suppressWarnings

RLumSK commented 5 years ago

@sckott: Thanks for your patient support. I did as requested. Remarks:

sckott commented 5 years ago

thanks for the changes.

I run check on the command line, like Rscript -e 'devtools::check(document = FALSE, cran = TRUE)'

For the cassettes, see https://ropenscilabs.github.io/http-testing-book/vcr-intro.html#vcr-enabled-testing - Need to run devtools::test() to have the cassettes created in the correct place. devtools::check() runs in a temporary directory and so all fiels are thrown away

RLumSK commented 5 years ago

Thanks for your feedback and the explanations. I did the test as you, but I did not see those errors, ok. Never mind, probably I have been overlooking something and I will have an eye on it the next time.

Besides, the devtools::test() did not work. It acts very aggressive and opens a lot of browser windows and flushes the console with error messages. Eventually, the tests stop due to 'too many failures'. After I deleted all yml-files some of the errors disappeared, but I did not manage to run the tests that way. Contrary, if I run the test manually they work (I run only the tests I contributed).

sckott commented 5 years ago

okay, I'm interested as to why devtools::test() is behaving differently. but for now we'll merge this. I'll ask in another thread about sharing what errors you're seeing when running tests

thanks again for your work on this!