ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
91 stars 20 forks source link

Feature Request: Progress bar for fetching large number of works #127

Closed rkrug closed 10 months ago

rkrug commented 1 year ago

I am doing a snowball search for around 2000 works, which obviously takes quite some time.

Would it be possible to put in oa_fetch() an optional progress bar, so that one can see the progress of the fetching?

Thanks,

Rainer

yjunechoe commented 1 year ago

both oa_fetch() and oa_snowball() have a verbose argument you can set to TRUE to get a progress bar.

Using your example from the other issue, the output should look something like this:

ids <- c("W1896013598", "W312683970", "W2084630927")

ilk_snowball <- oa_snowball(
  identifier = ids,
  verbose = TRUE
)
# Requesting url: https://api.openalex.org/works?filter=openalex_id%3AW1896013598%7CW312683970%7CW2084630927
# Getting 1 page of results with a total of 3 records...
# Collecting all documents citing the target papers...
# Requesting url: https://api.openalex.org/works?filter=cites%3AW1896013598%7CW312683970%7CW2084630927
# Getting 2 pages of results with a total of 221 records...
# OpenAlex downloading [=====================] 100% eta:  0s
# converting [===============================] 100% eta:  0s
# Collecting all documents cited by the target papers...
# Requesting url: https://api.openalex.org/works?filter=cited_by%3AW1896013598%7CW312683970%7CW2084630927
# Getting 1 page of results with a total of 77 records...
# converting [===============================] 100% eta:  0s

Does this provide what you need?

rkrug commented 1 year ago

Partly - it would be nice, if one could only see the progress pars and not the URLs. As you can see in my real-life output below, they take up some considerable space. But it is an option at the moment.

Screenshot 2023-07-19 at 13 03 03

yjunechoe commented 1 year ago

Hm yeah I can see how that'd be useful.

This actually seems pretty specific to the case of encoded URLs that run very long. I can imagine a simple fix to have a global option that suppresses printing the long "Reguesting url: ..." line, since that's only useful for debugging not really of interest to general users.

https://github.com/ropensci/openalexR/blob/66f07433b5efbdff16c581da9fdb754fb649fb4b/R/oa_fetch.R#L566

Thanks for the screenshot - I'll wait for other to comment!

rkrug commented 1 year ago

True. But I would not change the verbose behaviour, but rather add another argument, e.g. progress.

rkrug commented 1 year ago

Or different verbosity levels - FALSE, TRUE, progress, ...

yjunechoe commented 1 year ago

Yeah those are certainly solutions as well. It's just different tradeoffs in convenience for the maintainer vs. the user. Not sure which is best myself but I'll let the others sit on it for a bit

trangdata commented 1 year ago

This actually seems pretty specific to the case of encoded URLs that run very long. I can imagine a simple fix to have a global option that suppresses printing the long "Reguesting url: ..." line, since that's only useful for debugging not really of interest to general users.

I like this idea! 💯 . Implemented in 3e58640.