Closed sckott closed 5 years ago
This is how I do it for now, as a wrapper around the original function. The id
column keeps track of which row corresponds to which input.
purrr::map_dfr(
c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527'),
oc_doi2ids, .id = 'id'
)
or
library(dplyr)
c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527') %>%
lapply(oc_doi2ids) %>%
bind_rows(.id = 'id')
or in base R
local({
lst <- lapply(c('10.1093/biomet/79.3.531', '10.1093/biomet/80.3.527'), oc_doi2ids)
df <- do.call(rbind, lst)
df$id <- rep.int(seq_along(lst), lengths(lst))
df
})
thanks @Selbosh - was just putting this off so i could get out the first version to cran sooner. pushing a change now
@Selbosh if you have time, if you could reinstall and try again, let me know what you think - output data.frame format has changed
thanks @amoeba for his Sparql based brain
Sorry, the latest version has broken it again.
> oc_doi2ids('10.1093/biomet/80.3.527')
Error in `[.data.frame`(df, , c("doi", "pmid", "pmcid", "paper")) :
undefined columns selected
Your code at
https://github.com/ropenscilabs/citecorp/blob/077a1d4e7f7d929587815c29c1772670c386e53f/R/oc_lookup.R#L10
assumes that all of these columns will exist for every input. In the example above, we have df
of the form
> df
doi paper
1 10.1093/biomet/80.3.527 https://w3id.org/oc/corpus/br/3378029
which as you can see has a paper
column and doi
(which was the input value
) but not the others. Rather than subsetting these exact columns (and throwing an error if any are missing), we should be subsetting any of these columns that might exist. For example
df <- df[, names(df) %in% c('doi', 'pmid', 'pmcid', 'paper')]
I will make a pull request with the same. I'd also recommend adding a unit test with this example, as it looks like the existing tests in tests/testthat/test-oc_lookup.R do not cover this situation.
right, i should have included in the tests more identifier egs, including the examples you gave.
For oc_doi2ids, oc_pmid2ids, and oc_pmcid2ids