mlibrary / rsvp

Ruby SIP Validation and Processing
Other
0 stars 0 forks source link

add pagetagging functionality #131

Open huangjqq opened 2 years ago

huangjqq commented 2 years ago

After image processing (what's done with the rsvp tool currently), we need to add pagetags using the script located at /quod-prep/prep/o/ocr/bin/tag-ocr-output.rb. But apparently the place where the pagetags are stored is no longer accessible because the (custom-made in-house developed) tool that's used to access it relied on an old now-defunct version of java. Pagetags are necessary to provide page structuring (keeping images/OCR aligned and described) in DLXS text class through the pageview.dat file and for HT through the meta.yml file.

A successful pageview tool would do the following:

huangjqq commented 2 years ago

Ideally, a single tool that allows Chris/Jackson (for DLXS) and DCU scanning technicians to add pagetags in the same format would be ideal (rather than relying on manually updating pageview.dat by hand).

It would also be useful to the tool to be able to add different number/sequence types like arabic numerals, roman numerals, capital and lowercase letters, etc for selected ranged, and to be able to manually enter something specific when needed that didn't fit a pattern.