Closed gorbynet closed 5 years ago
thanks @gorbynet
I assume you mean with deep paging? looking into it, not done progress bars before in python
clint is another option
Scott, thanks for picking this up. I'm not sure how it would work (I'm quite inexperienced at Python) but what I meant was that it would be useful to have some way of showing progress when deep paging through a large dataset. That might be a progress bar, or some way of the crossref module feeding back to the calling script so that the script can choose how to reflect that information.
thanks, i'll experiment and ask for your feedback
this is more complicated than i thought. we need i think to have an idea of how many requests we'll make for a progress bar to work.
we're doing requests in a while loop https://github.com/sckott/habanero/blob/master/habanero/request_class.py#L68-L73 - we need to look at total results found by the server and also cursor_max
okay, install from progress-bar
pip3 install --user https://github.com/sckott/habanero/archive/progress-bar.zip
and try e.g.,
from habanero import Crossref
cr = Crossref()
res = cr.works(query = "octopus", cursor = "*", limit = 500, progress_bar = True)
let me know what you think
the progress bar is not integrated into requests - it's only activeated when deep paging - it's just a progress bar on the while loop that uses an estimate of how many requests will be done
any thoughts on this @gorbynet ?
Apologies - this my personal Github account, and I only use the Crossref library for work, and haven't looked at it for months. I've got an audit that needs the deep paging, so I'll have a look at this now.
Hi @sckott , that works perfectly. Thank you!
great, glad it works. I'll merge this into master soon
When systematically retrieving large data sets, it would be useful to have some way of measuring progress through the data harvest, e.g. show how many records have been retrieved, and how many there are in total, while the retrieval is ongoing.