qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
332 stars 27 forks source link

No output xml on empty pages? #29

Closed jbarth-ubhd closed 3 years ago

jbarth-ubhd commented 3 years ago

Did process https://digi.ub.uni-heidelberg.de/diglitData/v/testset-ls-20201029.tgz through eynollah, but these 3 xml output files are missing:

< leere_Seite_-_1564_-_drwFronsperger1564_-_054v.tif
< leere_Seite_-_1700_-_oxenstierna1700_-_f.tif
< leere_Seite_-_1775_-_karl_theodor1775b_-_b.tif

There are a lot of ERROR eynollah - cannot convert float NaN to integer, but I'll assume those are not really errors.

Don't know which scan exactly - there are 3× error messages like this in the log:

20:56:53.243 INFO eynollah - resize and enhance image
20:56:53.245 INFO eynollah - Detected 300 DPI
20:57:05.408 INFO eynollah - Found 1 columns ([[9.9999952e-01 4.7599070e-07 1.7426101e-17 7.0970045e-18 3.1075767e-20
  1.6549760e-17]])
20:57:05.428 INFO eynollah - Image is not enhanced
20:57:05.474 INFO eynollah - Enhancing took 12.230977296829224s
20:57:10.428 INFO eynollah - Image dimensions: 448x672
20:57:47.870 INFO eynollah - Image dimensions: 448x672
20:58:21.326 INFO eynollah - Image dimensions: 448x672
20:59:06.024 INFO eynollah - ratio_of_two_models: 97.82006537033959
20:59:06.467 INFO eynollah - Textregion detection took 120.9934618473053s
20:59:12.820 ERROR eynollah - zero-size array to reduction operation minimum which has no identity
20:59:12.820 INFO eynollah - Graphics detection took 6.352412223815918s
20:59:12.820 INFO eynollah - cont_page [array([[ 385,  190],
       [2528,  190],
       [2528, 4049],
       [ 385, 4049]])]
20:59:12.820 INFO eynollah - No columns detected, outputting an empty PAGE-XML
<exif><width>2573</width><height>4374</height><photometricInterpretation>RGB</photometricInterpretation><n_frames>1</n_⸗
frames><compression>raw</compression><photometric_interpretation>None</photometric_interpretation><xResolution>300</xRe⸗
solution><yResolution>300</yResolution><resolutionUnit>inches</resolutionUnit><resolution>300</resolution></exif>
Traceback (most recent call last):
  File "/usr/local/bin/eynollah", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/qurator/eynollah/cli.py", line 135, in main
    pcgts = eynollah.run()
  File "/usr/local/lib/python3.7/dist-packages/qurator/eynollah/eynollah.py", line 1618, in run
    pcgts = self.writer.build_pagexml_no_full_layout([], page_coord, [], [], [], [], [], [], [], [], [], cont_page)
TypeError: build_pagexml_no_full_layout() missing 1 required positional argument: 'cont_page'
kba commented 3 years ago

There was a missing argument for the "empty page" call to build_pagexml_no_full_layout, fixed in #30.