slub / ocrd_kitodo

Docker integration of Kitodo.Production and OCR-D
MIT License
9 stars 6 forks source link

configure and edit workflows flexibly via Monitor #36

Open bertsky opened 2 years ago

bertsky commented 2 years ago

Currently, OCR workflows must be installed into Production in advance by placing the ocrd process script files into ./kitodo/data/ocr_workflows with a .sh suffix, and then configuring them in the projects settings (thereby tying them to new processes).

But what if a user wants to use a different OCR workflow for some processes in a project, or change the workflow for existing processes (because they did not work / run through or the results do not look good)?

For now, one would need to edit the file ocr-workflow.sh in the process directory, and re-trigger the OCR script. (OCR Processing itself is already incremental, so the workflow will then continue to build what ever is still necessary or out-of-date.) But that is tedious and requires access to the file system (Manager share).

The user experience could be much better if we made workflows configurable on the web pages of the Monitor. Crucially, we should allow editing and re-running OCR workflows:

  1. create a volume for kitodo/data/ocr_workflows to be shared by Production, Manager and Monitor
  2. add an endpoint (and reference it on th index page) for listing existing workflows
  3. make workflows editable (in a simple text form field, perhaps with syntax highlighting), create a new version when saving
  4. in the workspace view, make workspaces multi-selectable and add an action button for (re-)processing with a selectable workflow
  5. in the job view, add an action button for re-processing with a selectable workflow

So if a task cannot be finished, because the OCR workflow failed (which in the future could also mean that it did not meet the configured quality threshold), then one will manually trigger said re-processing.

We could even provide a null workflow that will always fail and therefore force you to choose your custom workflow dynamically (per-process).

Saved workflows could also be version-controlled. The workflows should have a free-form description, but their file name should be a hash of their (non-comment, non-whitespace) content.

Also, the Manager should collect statistics about all workflows (which ones ran how often and with what success or quality level), so the Monitor can show them.