I took the liberty of running black (https://pypi.org/project/black/) to reformat the code according to standards (e.g., PEP-8).
(808e3b2f6c6b817c7950ec4e657f3f2af60c31f4, 60ad3f3289cffd5f741c0993712549ec2381f291).
Secondly (the main reason of my PR), I refactored the code so that process_pdf() outputs text instead of writing directly into a TEI file. (557176318b1d5e2f41258210f617a62e62ae68c0)
My use case is: I want to process local files, and then postprocess the resulting TEX xml output directly from Python. For this use case, writing a .tei.xml file and then reading it back would incur in a useless overhead.
My changes shouldn't be disruptive: process_batch() calls process_pdf() as before, with fewer arguments, and it takes care of writing onto a file.
Ideally, in the future it'd be nice if the API exposed process_batch() in a similar fashion. But having process_pdf() would already be a good thing.
Hi there. Thanks for your code!
I took the liberty of running
black
(https://pypi.org/project/black/) to reformat the code according to standards (e.g., PEP-8). (808e3b2f6c6b817c7950ec4e657f3f2af60c31f4, 60ad3f3289cffd5f741c0993712549ec2381f291).Secondly (the main reason of my PR), I refactored the code so that
process_pdf()
outputs text instead of writing directly into a TEI file. (557176318b1d5e2f41258210f617a62e62ae68c0)My use case is: I want to process local files, and then postprocess the resulting TEX xml output directly from Python. For this use case, writing a
.tei.xml
file and then reading it back would incur in a useless overhead.My changes shouldn't be disruptive:
process_batch()
callsprocess_pdf()
as before, with fewer arguments, and it takes care of writing onto a file.Ideally, in the future it'd be nice if the API exposed
process_batch()
in a similar fashion. But havingprocess_pdf()
would already be a good thing.Let me know what you think about these changes.