ropensci / citecorp

Client for the Open Citations Corpus
https://docs.ropensci.org/citecorp
Other
11 stars 4 forks source link

allow length > 1 id input in id converter fxns #1

Closed sckott closed 5 years ago

sckott commented 5 years ago

For oc_doi2ids, oc_pmid2ids, and oc_pmcid2ids

Selbosh commented 5 years ago

This is how I do it for now, as a wrapper around the original function. The id column keeps track of which row corresponds to which input.

purrr::map_dfr(
  c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527'),
  oc_doi2ids, .id = 'id'
)

or

library(dplyr)
c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527') %>%
  lapply(oc_doi2ids) %>%
  bind_rows(.id = 'id')

or in base R

local({
  lst <- lapply(c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527'), oc_doi2ids)
  df <- do.call(rbind, lst)
  df$id <- rep.int(seq_along(lst), lengths(lst))
  df
})
sckott commented 5 years ago

thanks @Selbosh - was just putting this off so i could get out the first version to cran sooner. pushing a change now

sckott commented 5 years ago

@Selbosh if you have time, if you could reinstall and try again, let me know what you think - output data.frame format has changed

sckott commented 5 years ago

thanks @amoeba for his Sparql based brain

Selbosh commented 5 years ago

Sorry, the latest version has broken it again.

> oc_doi2ids('10.1093/biomet/80.3.527')
Error in `[.data.frame`(df, , c("doi", "pmid", "pmcid", "paper")) : 
  undefined columns selected

Your code at https://github.com/ropenscilabs/citecorp/blob/077a1d4e7f7d929587815c29c1772670c386e53f/R/oc_lookup.R#L10 assumes that all of these columns will exist for every input. In the example above, we have df of the form

> df
                    doi                                 paper
1 10.1093/biomet/80.3.527 https://w3id.org/oc/corpus/br/3378029

which as you can see has a paper column and doi (which was the input value) but not the others. Rather than subsetting these exact columns (and throwing an error if any are missing), we should be subsetting any of these columns that might exist. For example

df <- df[, names(df) %in% c('doi', 'pmid', 'pmcid', 'paper')]

I will make a pull request with the same. I'd also recommend adding a unit test with this example, as it looks like the existing tests in tests/testthat/test-oc_lookup.R do not cover this situation.

sckott commented 5 years ago

right, i should have included in the tests more identifier egs, including the examples you gave.