kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
275 stars 74 forks source link

Lacking capability for in-memory processing. #66

Open maxupp opened 8 months ago

maxupp commented 8 months ago

The fact that output can only be written to files and not kept in memory for further processing is a major drawback. I suggest returning a dictionary with all the TEI objects.

kermitt2 commented 8 months ago

Hi @maxupp !

If you process just one file, client.process_pdf() returns the response in memory and you can just parse it with a python XML parser.

If you process files in batch, instead of writing the server responses in files on disk you can change the behavior here: https://github.com/kermitt2/grobid_client_python/blob/master/grobid_client/grobid_client.py#L228

Or do I misunderstand the issue?

The idea of this client is to provide a simple basis (only dependencies on standard python libraries) that can be extended as needed.