This commit ignores non-Roman numeral, non-Arabic page numbers and treats them like normal text.
In doing so it fixes the following:
2024-08-17 22:28:23,333 INFO python-derivermodule version: 1.0.25; hocr version: 1.1.61; and entrypoint version: 1.0.1.
2024-08-17 22:28:23,333 INFO sourceFile: '/item/DTIC_ADA040218_abbyy.gz' -> targetFile: '/var/tmp/tmp/generated/DTIC_ADA040218/tmp_daisy.zip'
2024-08-17 22:28:23,357 INFO converting /item/DTIC_ADA040218_abbyy.gz to hocr
2024-08-17 22:28:27,628 INFO successfully converted /item/DTIC_ADA040218_abbyy.gz to hocr (/tmp/tmp.hocr.html)
2024-08-17 22:28:27,628 INFO converting /tmp/tmp.hocr.html to daisy
2024-08-17 22:28:27,978 INFO Failure while parsing zip iabook: Traceback (most recent call last):
File "/usr/local/bin/hocr-to-daisy", line 467, in
dg.process_book_hocr(ebook=daisy_book)
File "/usr/local/bin/hocr-to-daisy", line 331, in process_book_hocr
ebook.add_pagetarget(pageno, pageno)
File "/usr/local/lib/python3.12/site-packages/hocr/daisy/book.py", line 213, in add_pagetarget
raise ValueError(error_text)
ValueError: Got non-Arabic, non-Roman numeral, or negative pagetarget value
Note, whereas previously a page was featured in navigation, now it just shows up as text, on the theory this stays vaguely consistent with how pages were being presented before, insofar as they were presented, and not dropped. But perhaps being dropped is preferred.
Some screenshots to illustrate this.
Page 40 ending, and page 41 starting, with Arabic numerals:
Now in the appendix, pages look like A-1, A-2, etc. Page A-2 ending, and page A-3 starting:
This commit ignores non-Roman numeral, non-Arabic page numbers and treats them like normal text.
In doing so it fixes the following:
Note, whereas previously a page was featured in navigation, now it just shows up as text, on the theory this stays vaguely consistent with how pages were being presented before, insofar as they were presented, and not dropped. But perhaps being dropped is preferred.
Some screenshots to illustrate this.
Page 40 ending, and page 41 starting, with Arabic numerals:
Now in the appendix, pages look like
A-1
,A-2
, etc. Page A-2 ending, and page A-3 starting: