ropensci / rcrossref

R client for various CrossRef APIs
https://docs.ropensci.org/rcrossref
Other
166 stars 20 forks source link

where is the next-cursor outputted? #157

Closed ataiprojects closed 6 years ago

ataiprojects commented 6 years ago

I expected it to be in res$meta, but it's not there. I think it would be helpful to include more details on this in the documentation. Thank you!

sckott commented 6 years ago

can you please give more details. what is your sessionInfo(), and what function(s) are you talking about

ataiprojects commented 6 years ago

Sorry, I've thought next-cursor is only ever used with deep paging, so it won't be ambiguous. Details: res1 = cr_works(query="ecology")

res1$meta total_results search_terms start_index items_per_page 1 320673 ecology 0 20

This tells me there are 320+ thousand works on ecology. Let's say I would like to gather metadata on the first 12 thousand of those and analyse it. res2 = cr_works(query="ecology", cursor = "*", cursor_max = 1000) would get me the first 1 thousand. To look at the 2nd thousand I would need the next-cursor value to substitute the *, right? Where do I get it?

Also, I have tried larger cursor_max values: 1100 works, but with 2000 I get: Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : Timeout was reached: Connection timed out after 10000 milliseconds

sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rcrossref_0.8.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.14 bindr_0.1 xml2_1.1.1 magrittr_1.5 xtable_1.8-2 R6_2.2.1 rlang_0.2.0
[8] bibtex_0.4.2 stringr_1.2.0 plyr_1.8.4 dplyr_0.7.4 tools_3.4.3 miniUI_0.1.1 htmltools_0.3.6 [15] assertthat_0.2.0 digest_0.6.12 tibble_1.3.3 bindrcpp_0.2 shiny_1.0.4 triebeard_0.3.0 curl_3.1
[22] crul_0.5.2 glue_1.1.1 mime_0.5 stringi_1.1.5 compiler_3.4.3 urltools_1.6.0 jsonlite_1.5
[29] httpuv_1.3.5 pkgconfig_2.0.1

Thank you.

sckott commented 6 years ago

The next-cursor is only returned from the crossref api if you use the cursor parameter. So as you showed above with cursor = "*" that uses deep paging through cursors. We do the paging then automatically, so you don't need to do it yourself.

here's an example to get the first 12K

res3 <- cr_works(query="ecology", cursor = "*", cursor_max = 12000L, limit = 1000L)
NROW(res3$data)
head(res3$data)
ataiprojects commented 6 years ago

Thanks! I'll continue testing later, and if there are timeout errors as you say, I'll post another issue.