ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
97 stars 21 forks source link

Using coro to create a generator #184

Closed trangdata closed 7 months ago

trangdata commented 11 months ago

Example use case:

query_url <- "https://api.openalex.org/works?filter=cites%3AW2755950973"
oar <- oa_generate(query_url)
for (i in seq.int(1000)){
# or something like: while record_i != ".__exhausted__."
  record_i <- oar()
  # processing record_i here
  # for example:
  # saveRDS(record_i, paste0("rec-", i, ".rds"))
}

# or, occasionally, if you know the result is a manageable list that fits in memory
# you can use coro::collect to get the entire list of results
# (although in this case we would recommend oa_request instead)
system.time({
  results <- coro::collect(oar)
})
rkrug commented 11 months ago

I personally think this introduces a lot of complexity into the package as well as for the user. It might solve the issue (I don't know because I do not understand Coro), but it is implementation wise heavier from my perspective.

rkrug commented 11 months ago

There is the convenience function oa_sowball() which would also profit from something along this line, but I think the implementation could be easier as the input papers need to be split into chunks?

trangdata commented 7 months ago

Now works with group_by:

library(openalexR)
query_url <- "https://api.openalex.org/works?search=biodiversity&group_by=primary_topic.id"
oar <- oa_generate(query_url)
for (i in seq.int(202)){
  record_i <- oar()
}
record_i
#> $key
#> [1] "https://openalex.org/T10207"
#> 
#> $key_display_name
#> [1] "DNA Nanotechnology and Bioanalytical Applications"
#> 
#> $count
#> [1] 55

Created on 2024-02-14 with reprex v2.0.2

Closes #207