ropensci / roadoi

Use Unpaywall with R
https://docs.ropensci.org/roadoi
Other
64 stars 3 forks source link

Error: Timeout was reached #22

Open ajorstad opened 6 years ago

ajorstad commented 6 years ago

Hi- I was successfully using roadoi several months ago (thanks for the tool!) on large queries, using:

oa_out = roadoi::oadoi_fetch(dois = this_doi, email = "name@email.com")

But since I have come back to the project in the past two weeks, I have not been able to query more than a small number of DOIs at a time before getting the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

For example, to extract data for 700 DOIs, I had to restart about 20 times, successfully downloading data for 5-50 DOIs at a time before timing out.

Any idea what is going wrong? I would like to be able to query tens of thousands of DOIs in the near future. Thanks for your help.

njahn82 commented 6 years ago

Sorry to hear @ajorstad . I will reach out to Unpaywall team, seems that the API get stuck somehow

ajorstad commented 6 years ago

Thanks, I look forward to the response!

njahn82 commented 6 years ago

Unpaywall Data has had some occasional slow downs recently due to heavy use, but the underlying infrastructure will be updated in the next weeks.

In the meanwhile, you could try catching these errors. Here's an reproducible example based on 100 random dois using the purrr::safely function.

library(roadoi)
library(rcrossref)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.4
#> ✔ tidyr   0.8.0     ✔ stringr 1.3.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
# get 100 random dois from work published in Nature (`issn = "1476-4687"`) from Crossref
random_dois <- rcrossref::cr_r(sample = 100, filter = c(issn = "1476-4687"))
# call Unpaywall Data safely with purrr
my_data <- purrr::map(random_dois, .f = purrr::safely(function(x)
  roadoi::oadoi_fetch(dois = x, email = "name@example.com")
))
# return tibble with results from Unpaywall data
purrr::map_df(my_data, "result")
#> # A tibble: 100 x 13
#>    doi     best_oa_location oa_locations data_standard is_oa journal_is_oa
#>    <chr>   <list>           <list>               <int> <lgl> <lgl>        
#>  1 10.103… <tibble [1 × 8]> <tibble [1 …             2 T     F            
#>  2 10.103… <tibble [1 × 8]> <tibble [1 …             2 T     F            
#>  3 10.103… <tibble [0 × 0]> <tibble [0 …             2 F     F            
#>  4 10.103… <tibble [0 × 0]> <tibble [0 …             2 F     F            
#>  5 10.103… <tibble [1 × 8]> <tibble [1 …             2 T     F            
#>  6 10.103… <tibble [0 × 0]> <tibble [0 …             2 F     F            
#>  7 10.103… <tibble [1 × 9]> <tibble [1 …             1 T     F            
#>  8 10.103… <tibble [0 × 0]> <tibble [0 …             2 F     F            
#>  9 10.103… <tibble [0 × 0]> <tibble [0 …             1 F     F            
#> 10 10.103… <tibble [0 × 0]> <tibble [0 …             1 F     F            
#> # ... with 90 more rows, and 7 more variables: journal_issns <chr>,
#> #   journal_name <chr>, publisher <chr>, title <chr>, year <chr>,
#> #   updated <chr>, non_compliant <list>
# show error messages
purrr::map(my_data, "error") %>%
  purrr::compact()
#> list()
ajorstad commented 6 years ago

Thank you for your response. We will try to catch the errors as you suggest for now, and hope the infrastructure update happens soon. Thanks again.