Closed mikegerber closed 4 months ago
pytest -k integ_ocrd_cli
METS' fileSec
looks like this:
<mets:fileSec>
<mets:fileGrp USE="OCR-D-GT-PAGE">
<mets:file MIMETYPE="application/xml" ID="OCR-D-GT-PAGE_00000024">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-GT-PAGE/00000024.page.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-CALAMARI">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-CALAMARI_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-CALAMARI/OCR-D-OCR-CALAMARI_0001.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-TESS">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-TESS_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-TESS/OCR-D-OCR-TESS_0001.xml"/>
</mets:file>
</mets:fileGrp>
</mets:fileSec>
This used to work.
Maybe it's because xlink:href
isn't really an URL? Or is it?
ocrd_model's ocrd_file.py looks like this is supposed to also have a LOCTYPE
and OTHERLOCTYPE
.
Our other "standard"/commonly used example files have the LOCTYPE, I'm trying those. The embedded test data may just be invalid and have been handled more graceful in earlier ocrd versions.
https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip has LOCTYPE
https://qurator-data.de/examples/actevedef_718448162.zip has LOCTYPE
https://qurator-data.de/examples/actevedef_718448162.first-page.zip has LOCTYPE
Adding LOCTYPE
/OTHERLOCTYPE
to the test data fixes the tests.
I'll commit the fix but leave this open until I can discuss it with @kba as I'm not sure if it's a regression in core/something that could conveniently be handled by core etc.
This was probably encountered elsewhere too, Closing.