Closed scottbarnes closed 2 months ago
Depends on #13. The relevant changes for this particular PR are in e365988f80b8fdbb2993eb6f3e43c3243124158b.
To the extent IA item metadata is available, it will be used, but now the minimum requirement is an hOCR file and an output file.
Minimum invocation:
❯ PYTHONPATH=. ./bin/hocr-to-daisy -f ./sim_english-illustrated-magazine_1884-12_2_15_hocr.html \ -o test_daisy_output.zip
https://archive.org/details/sim_english-illustrated-magazine_1884-12_2_15
(Nearly) Maximal invocation (without TOC):
❯ PYTHONPATH=. ./bin/hocr-to-daisy -f /home/scott/Downloads/daisy/items/sim_english-illustrated-magazine_1884-12_2_15/sim_english-illustrated-magazine_1884-12_2_15_hocr.html \ -m /home/scott/Downloads/daisy/items/sim_english-illustrated-magazine_1884-12_2_15/sim_english-illustrated-magazine_1884-12_2_15_meta.xml \ -s /home/scott/Downloads/daisy/items/sim_english-illustrated-magazine_1884-12_2_15/sim_english-illustrated-magazine_1884-12_2_15_scandata.xml \ -o test_daisy_output.zip
It's not pictured, but the (nearly) maximal option also includes page numbers in the DAISY, where as the minimal option does not include these.
Depends on #13. The relevant changes for this particular PR are in e365988f80b8fdbb2993eb6f3e43c3243124158b.
To the extent IA item metadata is available, it will be used, but now the minimum requirement is an hOCR file and an output file.
Minimum invocation:
https://archive.org/details/sim_english-illustrated-magazine_1884-12_2_15
(Nearly) Maximal invocation (without TOC):
It's not pictured, but the (nearly) maximal option also includes page numbers in the DAISY, where as the minimal option does not include these.