An extensible viewer for OCR-D mets.xml files
AlternativeImage
on any level of the structural hierarchy)OCR-D Browser requires Python 3.7 or higher.
The native installation requires GTK 3.
In any case you need a virtual environment with a current pip
version (>=20), preferably your existing OCR-D venv:
git clone https://github.com/hnesk/browse-ocrd.git
cd browse-ocrd
sudo make deps-ubuntu
make install
sudo apt install libcairo2-dev libgirepository1.0-dev
pip install browse-ocrd
If you have installed Docker, you can build OCR-D Browser as a web service:
docker build -t ocrd_browser .
Or use a prebuilt image from Dockerhub:
docker pull hnesk/ocrd_browser
Start the app with the filesystem path to the METS file of your OCR-D workspace:
browse-ocrd ./path/to/mets.xml
You can still open another METS file from the UI though.
When running the webservice, you need to pass a directory DATADIR
which (recursively) contains all the workspaces you want to serve.
The top entrypoint http://localhost/
will show an index page with a link http://localhost/browse/...
for each workspace path.
Each link will run browse-ocrd
at that workspace in the background, and then redirect your browser to the internal Broadway server, which renders the app in the web browser.
To start up, just do:
docker run -it --rm -v DATADIR:/data -p 8085:8085 -p 8080:8080 ocrd_browser
At startup the following directories a searched for a config file named ocrd-browser.conf
# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs() # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir() # '/home/jk/.config/ocrd-browser.conf'
os.getcwd() # './ocrd-browser.conf'
The ocrd-browser.conf
file is an ini-file with the following sections and keys:
[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser
# Comma separated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL
# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"
[Tool PageViewer]
# shell commandline to execute with placeholders
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}
Note: You can get PRImA's PageViewer at Github.
The commandline
string will be used as a python format string with the keyword arguments:
workspace
: The current ocrd.Workspace
, all properties get shell escaped (by shlex.quote
) automatically.file
: The current ocrd_models.OcrdFile
, all properties get shell escaped (by shlex.quote
) automatically, also there is an additional property path
with the properties absolute
and relative
, so {file.path.absolute}
will be replaced by the shell quoted absolute path of the file. It is possible to set or override values of the configuration through environment variables. The environment variables follow this structure : BROCRD__{SECTION}__{KEY}
, where SECTION
and KEY
are in upper snake case and divided by a double underscore (__
). If the section title contains spaces, the single words are also divided by __
.
Some examples:
BROCRD__FILE_GROUPS__PREFERRED_IMAGES='THUMB'
BROCRD__TOOL__PAGEVIEWER__COMMANDLINE='ls {file.path.absolute}'