Open bertsky opened 1 year ago
Clearly, this is related to https://github.com/qurator-spk/eynollah/pull/86/commits/c606391c312eceab9aa3ebff071bdf12a30b45cc (dir_in
mode), but it's a huge changeset, and I don't understand it (also, the diff is suboptimal).
BTW, what was the reason for dir_in
(and why not also put the loaded models into class members in the single-image mode, so most of the dir_in
conditionals would have been unnecessary)? IMHO for efficiency it would be necessary to separate CPU-bound from GPU-bound computation and make them run quasi parallel (for example via queueing). The dir_in
change does avoid reloading models, but still requires many transfers to/from GPU memory and waiting for pre- and postprocessing, which is CPU-bound. Also, it's not possible to write a OCR-D wrapper for this kind of API.
I have the same problem with 0.3.0:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
File "/usr/local/share/pyenv/versions/3.7.17/bin/ocrd-eynollah-segment", line 8, in <module>
sys.exit(main())
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 126, in ocrd_cli_wrap_processor
run_processor(processorClass, mets_url=mets, workspace=workspace, **kwargs)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 131, in run_processor
raise err
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 128, in run_processor
processor.process()
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/processor.py", line 59, in process
Eynollah(**eynollah_kwargs).run()
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/eynollah.py", line 3092, in run
self.writer.write_pagexml(pcgts)
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/site-packages/qurator/eynollah/writer.py", line 138, in write_pagexml
out_fname = os.path.join(self.dir_out, self.image_filename_stem) + ".xml"
File "/usr/local/share/pyenv/versions/3.7.17/lib/python3.7/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Reproducer:
#!/bin/sh
set -ex
test_id=`basename $0`
cd `mktemp -d /tmp/$test_id-XXXXX`
# Prepare processors
ocrd resmgr download ocrd-eynollah-segment default
# Prepare test workspace
wget https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip
unzip actevedef_718448162.first-page+binarization+segmentation.zip
cd actevedef_718448162.first-page+binarization+segmentation
# Run tests
ocrd-eynollah-segment -P models default -I OCR-D-IMG-BIN -O TEST-EYNOLLAH-SEG
@mikegerber for a recent version usable in the OCR-D ecosystem, you need https://github.com/qurator-spk/eynollah/tree/706433c5049c63c6e16fee5f71d81a7e507abe06 which is part of https://github.com/qurator-spk/eynollah/pull/108
Thanks but - I think - I don't currently have this problem - no conflicting qurator packages here. (And my recommendation to the team was to move away from the - useless - common
qurator.*
namespace.)
The most thorough test regarding the namespace inconsistency problem I know of is to install both sbb_binarize and eynollah in devel mode and then run --help
on the OCR-D CLIs.
@bertsky Is this particular problem even connected to the namespace problem? It seems unrelated.
@mikegerber no, but as as argued in #108 and above: you need the namespace problem fixed in the current ecosystem, while due to the present issue the master branch is dysfunctional since #86.
Hi, we are aware of the issue(s) with the OCR-D CLI. Please note that these all relate to the OCR-D CLI only, while the tool is otherwise perfectly usable - "completely unusable in certain circumstances" is a bit too harsh/inprecise, this is only the case when the "circumstances" are OCR-D.
I believe we had this discussion already in the past (even multiple times), but please take into account that the OCR-D CLI is not the top priority for our development here, as this is not an OCR-D project. In fact, we are currently hiring another position to work on this, so please be a bit patient.
We try to balance the project work plan, issues reported here and the availability of staff when planning the work and priorities, so please refrain from introducing new labels for urgency! But if it helps we can introduce an "OCR-D" label for all issues relating to that.
I also don't see the point of urging this right now. We had a bit of a hustle when all this surfaced during the last ocrd_all release/repair sprint, but as explained above we did find a workable interim solution (ie. reverting #86 and #97 while cherry-picking #100).
I finally found some time to dive a little deeper into this issue, and now I must agree with @bertsky that https://github.com/qurator-spk/eynollah/commit/c606391c312eceab9aa3ebff071bdf12a30b45cc probably should not have been merged in the first place...alas, we now have it in main, so we will proceed to dissect the problematic changes from the larger changeset.
Since the last update I am getting
Have not looked further yet.