kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

Notify user of current job's progress in output #29

Open jacksongoode opened 3 years ago

jacksongoode commented 3 years ago

How would one go about running client.process and then continuing once complete? It seems anything after the process is discarded.

kermitt2 commented 3 years ago

Hello @jacksongoode ! Not sure I understand the question... this is a client and the GROBID server remain "warm". The client just sends PDF and gets back XML, what exactly would be hold by a client here?

jacksongoode commented 3 years ago

Ahh, I see. I managed to get everything working but was confused with the lack of output even with the verbose flag. Would it be possible to capture the status of the current job through the python client?

kermitt2 commented 3 years ago

Would it be possible to capture the status of the current job through the python client?

Yes sure, we could extend the "verbose" mode to make it more readable and useful. Which information would like to see?

We could prefix by file name/path and indicate "sent", "output written", things like that maybe? But usually queries are in parallel and pretty fast, it might be a console mess.

In another issue we discussed having a progress bar, but it means counting the files in a first pass and thus slowing down a bit the process, in particular if we consider folder with millions of PDF (which is a real world usage in my case). It could be optional?

jacksongoode commented 3 years ago

Yes, I think something along the lines of a tqdm style progress bar would be really nice. I'm currently working with ~2k PDFs so printing each to console would be a mess.

But for a lot of users, the long pause in the script might causes some concern if they aren't aware that Grobid is doing its job.

jacksongoode commented 2 years ago

In addition to this feature, I am also curious if it makes sense to suppress the output when the file exists unless verbose is specified?