Open huangjqq opened 2 years ago
Ideally, a single tool that allows Chris/Jackson (for DLXS) and DCU scanning technicians to add pagetags in the same format would be ideal (rather than relying on manually updating pageview.dat by hand).
It would also be useful to the tool to be able to add different number/sequence types like arabic numerals, roman numerals, capital and lowercase letters, etc for selected ranged, and to be able to manually enter something specific when needed that didn't fit a pattern.
After image processing (what's done with the rsvp tool currently), we need to add pagetags using the script located at
/quod-prep/prep/o/ocr/bin/tag-ocr-output.rb
. But apparently the place where the pagetags are stored is no longer accessible because the (custom-made in-house developed) tool that's used to access it relied on an old now-defunct version of java. Pagetags are necessary to provide page structuring (keeping images/OCR aligned and described) in DLXS text class through thepageview.dat
file and for HT through themeta.yml
file.A successful pageview tool would do the following: