kermitt2 / pdfalto

PDF to XML ALTO file converter
GNU General Public License v2.0
207 stars 67 forks source link

Feature request: allow reading from stdin #16

Open Aazhar opened 6 years ago

Aazhar commented 6 years ago

kermitt2/pdf2xml/issues/5

Downchuck commented 3 years ago

Seems like this would need to be wrapped in a parent tag, so that _metadata, _annot and _outline may be maintained.

I'm curious if the PDF format terminates in a manner where running something like cat *.pdf | ./pdfalto - > output.xml would work