ropensci / roadoi

Use Unpaywall with R
https://docs.ropensci.org/roadoi
Other
64 stars 3 forks source link

Question about stop condition in roadoi::oadoi_fetch #9

Closed cassebas closed 7 years ago

cassebas commented 7 years ago

Hi, first off thanks for this tool:) I was wondering if there's a particular reason for stopping the script when either a DOI is incorrect or oaDOI doesn't return results within the timeout. When the dois list given as an argument to the function is large, the probability of zero results increases, i.e. when the last DOI in the list is incorrect the result of the function will not be available, even for the correct DOIs (same with timeouts). Is there any reason for not using a tryCatch mechanism and 'remember' the erroneous DOIs? (or is it a feature not yet implemented? ;-) Thnxs, Caspar Treijtel (Library of the University of Amsterdam)

njahn82 commented 7 years ago

Hi Caspar, good question!

The reason for stopping when something goes wrong is to ensure that only expected metadata is returned and that oaDOI is not bombed with erroneous requests. However, there is an easy way to use oadoi_fetch(), so that errors are catched. Simply call oadoi_fetch() using failwith() function from the plyr package.

Here's an reproducible example, which I will add to the documentation for users who want to catch possible errors.

# Get 100 random dois using rcrossref package
random_dois <- rcrossref::cr_r(sample = 100)
# call oadoi_fetch per DOI using plyr::failwith
purrr::map_df(random_dois, 
              plyr::failwith(f = function(x) roadoi::oadoi_fetch(x, email = "najko.jahn@gmail.com")))
#> # A tibble: 100 x 20
#>                                                        `_best_open_url`
#>                                                                   <chr>
#>  1                                                                 <NA>
#>  2                                                                 <NA>
#>  3                          http://dx.doi.org/10.1016/j.cub.2015.06.026
#>  4                                                                 <NA>
#>  5                                                                 <NA>
#>  6                                                                 <NA>
#>  7                                                                 <NA>
#>  8                                                                 <NA>
#>  9 https://naldc.nal.usda.gov/naldc/download.xhtml?id=31820&content=PDF
#> 10                                                                 <NA>
#> # ... with 90 more rows, and 19 more variables: `_closed_base_ids` <list>,
#> #   `_green_base_collections` <list>, `_open_base_ids` <list>,
#> #   `_open_urls` <list>, doi <chr>, doi_resolver <chr>, evidence <chr>,
#> #   found_green <lgl>, found_hybrid <lgl>, free_fulltext_url <chr>,
#> #   is_boai_license <lgl>, is_free_to_read <lgl>,
#> #   is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
#> #   oa_color_long <chr>, reported_noncompliant_copies <list>, url <chr>,
#> #   year <int>
cassebas commented 7 years ago

Hi Najko, Thnxs for your quick response and explanation, this is exactly what I needed!

njahn82 commented 7 years ago

Glad, it works for you. Added the example to vignette and README.