hnesk / browse-ocrd

An extensible viewer for OCR-D mets.xml files
MIT License
20 stars 9 forks source link

add PAGE annotation view #15

Closed bertsky closed 2 years ago

bertsky commented 3 years ago

Without re-inventing the wheel for displaying PAGE annotations, there is a lot of added-value in having one view option for this here. PageViewer does not know of OCR-D's relative path convention and can only show isolated pages, LAREX cannot cope with METS and OCR-D directory structures.

We already discussed integrating PageViewer loosely by just triggering a command-line call (in the simplest case, using --resolve-dir workspace-directory), or adding some IPC capability to PageViewer itself and then remote-controlling it from ocrd_browser.

Alternatively, one might be able to integrate nw-page-editor's HTML via GTK's WebKit component.

bertsky commented 3 years ago

As for the PageViewer CLI call, in the simplest case we could add a simple button in the XmlView action bar. We should offer some way to configure the exact command line to use, though. Perhaps in the menu under a new "settings" widget? As a first step, we could just query an environment variable, so we would have to do e.g. export PAGEVIEWER="java -jar path/to/JPageViewer.jar" before starting ocrd_browser.

mikegerber commented 3 years ago

PageViewer does not know of OCR-D's relative path convention

jpageviewer () {
    _jpageviewer_resolve_dir=`dirname "$1"` 
    if [ -e "$_jpageviewer_resolve_dir"/../mets.xml ]
    then
        _jpageviewer_resolve_dir="$_jpageviewer_resolve_dir"/.. 
    fi
    java -Dhttp.proxyHost=http-proxy.sbb.spk-berlin.de -Dhttp.proxyPort=3128 -jar $_jpageviewer_jar --resolve-dir "$_jpageviewer_resolve_dir" "$1"
    unset _jpageviewer_resolve_dir
}

(Of course, the proxy settings are specific to my environment)

bertsky commented 3 years ago

@mikegerber thanks for sharing your recipe – but I think in this case we don't need to guess where the workspace directory is relative to the PAGE file path, because we already control all (absolute) paths. We can just call whatever the user configured as base command and append --resolve-dir workspace-directory page-file image-file. (The resolve-dir arg is still useful because the use might want to change to a different file interactively.)

bertsky commented 3 years ago

21 brought a partial fix.

mikegerber commented 3 years ago

Without re-inventing the wheel for displaying PAGE annotations,

Not sure if avoiding re-inventing the wheel is the right thing here, it's just drawing a few polygons... Page Viewer is good but it also has the problem that it's hard to fix problems or add functionality

bertsky commented 3 years ago

Not sure if avoiding re-inventing the wheel is the right thing here, it's just drawing a few polygons... Page Viewer is good but it also has the problem that it's hard to fix problems or add functionality

I fully agree – hence this recommendation

hnesk commented 2 years ago

You were right, that wheel wasn't that hard to reinvent. In the pageview-branch (https://github.com/hnesk/browse-ocrd/pull/30) there is an experimental PageViewer like view. I expect bugs, because it required some structural changes to other parts of browse-ocrd, so any testing is much appreciated.