kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

Code formatting with "black", and make process_pdf() a file-independent function #25

Closed gchers closed 3 years ago

gchers commented 3 years ago

Hi there. Thanks for your code!

I took the liberty of running black (https://pypi.org/project/black/) to reformat the code according to standards (e.g., PEP-8). (808e3b2f6c6b817c7950ec4e657f3f2af60c31f4, 60ad3f3289cffd5f741c0993712549ec2381f291).

Secondly (the main reason of my PR), I refactored the code so that process_pdf() outputs text instead of writing directly into a TEI file. (557176318b1d5e2f41258210f617a62e62ae68c0)

My use case is: I want to process local files, and then postprocess the resulting TEX xml output directly from Python. For this use case, writing a .tei.xml file and then reading it back would incur in a useless overhead.

My changes shouldn't be disruptive: process_batch() calls process_pdf() as before, with fewer arguments, and it takes care of writing onto a file.

Ideally, in the future it'd be nice if the API exposed process_batch() in a similar fashion. But having process_pdf() would already be a good thing.

Let me know what you think about these changes.

kermitt2 commented 3 years ago

Hi @gchers !

Thanks a lot for the PR.

The changes are very clear and make perfect sense: thank you!

gchers commented 3 years ago

Great, thank you!:)