kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

Nested Output Directory #20

Closed fschlatt closed 3 years ago

fschlatt commented 3 years ago

When inputting a nested input directory as well as a desired output directory, the file tree is being matched in the output directory. The tei.xml files just get dumped in the output directory. This can lead to issues when multiple pdf files have the same name in a nested input directory. If the client is told to force overwrite xml files, identically named files are overwritten. To fix this, the nested file tree needs to copied to the output directory. At your discretion I have created a PR that solves this issue. I've also included directory creation in case the output directory hasn't been created yet.

kermitt2 commented 3 years ago

Thanks a lot @fschlatt ! PR #21 merged