sckott / habanero

client for Crossref search API
https://habanero.readthedocs.io
MIT License
207 stars 30 forks source link

Feature request: progress indicator when deep paging #77

Closed gorbynet closed 5 years ago

gorbynet commented 6 years ago

When systematically retrieving large data sets, it would be useful to have some way of measuring progress through the data harvest, e.g. show how many records have been retrieved, and how many there are in total, while the retrieval is ongoing.

sckott commented 6 years ago

thanks @gorbynet

I assume you mean with deep paging? looking into it, not done progress bars before in python

sckott commented 6 years ago

maybe https://github.com/weecology/retriever/blob/0112008b710d176fc543be174bcd1205cb1fef1e/retriever/lib/engine.py#L455 using https://pypi.org/project/tqdm/

sckott commented 6 years ago

clint is another option

gorbynet commented 6 years ago

Scott, thanks for picking this up. I'm not sure how it would work (I'm quite inexperienced at Python) but what I meant was that it would be useful to have some way of showing progress when deep paging through a large dataset. That might be a progress bar, or some way of the crossref module feeding back to the calling script so that the script can choose how to reflect that information.

sckott commented 6 years ago

thanks, i'll experiment and ask for your feedback

sckott commented 5 years ago

this is more complicated than i thought. we need i think to have an idea of how many requests we'll make for a progress bar to work.

we're doing requests in a while loop https://github.com/sckott/habanero/blob/master/habanero/request_class.py#L68-L73 - we need to look at total results found by the server and also cursor_max

sckott commented 5 years ago

okay, install from progress-bar

pip3 install --user https://github.com/sckott/habanero/archive/progress-bar.zip

and try e.g.,

from habanero import Crossref
cr = Crossref()
res = cr.works(query = "octopus", cursor = "*", limit = 500, progress_bar = True)
sckott commented 5 years ago

let me know what you think

the progress bar is not integrated into requests - it's only activeated when deep paging - it's just a progress bar on the while loop that uses an estimate of how many requests will be done

sckott commented 5 years ago

any thoughts on this @gorbynet ?

gorbynet commented 5 years ago

Apologies - this my personal Github account, and I only use the Crossref library for work, and haven't looked at it for months. I've got an audit that needs the deep paging, so I'll have a look at this now.

gorbynet commented 5 years ago

Hi @sckott , that works perfectly. Thank you!

sckott commented 5 years ago

great, glad it works. I'll merge this into master soon