kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML
11 stars 2 forks source link

ReadingOrder: be robust if missing #18

Open bertsky opened 1 year ago

bertsky commented 1 year ago

Currently, if the input has no ReadingOrder or no OrderedGroup, then we get:

Traceback (most recent call last):
  File "/data/ocr-d/ocrd_all/venv/bin/transkribus-to-prima", line 8, in <module>
    sys.exit(cli())
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/transkribus_to_prima/cli.py", line 27, in cli
    getattr(converter, f'convert_{convert}')()
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/transkribus_to_prima/convert.py", line 29, in convert_reading_order
    ro = self.tree.xpath('//*[local-name()="ReadingOrder"]/*[local-name()="OrderedGroup"]')[0]
IndexError: list index out of range