kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
58 stars 65 forks source link

Import of processes based on the file system #5649

Open matthias-ronge opened 1 year ago

matthias-ronge commented 1 year ago

Scan service providers often digitize large amounts of inventory for Kitodo Production users. The metadata is not always clear before processing, but digitization can also be cataloging at the same time. For some users, their instance of Kitodo Production cannot be accessed from the outside for reasons of IT security, so that service providers cannot create the processes right in Kitodo Production.

Goal: A way should be provided that enables the import of processes—including hierarchical processes—bundled with the images. It should also be possible to import processes that have previously been created in a (different) Kitodo Production instance. The content of the process directory is imported, the newly created process gets a new workflow state based on the Production Template used for the import.

matthias-ronge commented 1 year ago

Sketch:

Service providers deliver content as folders, each containing a (maybe minimal) meta.xml file for the metadata. Example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:m="http://meta.kitodo.org/v1/">
<dmdSec ID="metadata"><mdWrap><xmlData><m:kitodo>

<m:metadata name="###KEY###">###VALUE###</m:metadata>
<m:metadata name="###KEY###">###VALUE###</m:metadata>
<m:metadata name="###KEY###">###VALUE###</m:metadata>

<m:metadataGroup name="###GROUP###">
  <m:metadata name="###KEY###">###VALUE###</m:metadata>
  <m:metadata name="###KEY###">###VALUE###</m:metadata>
</m:metadataGroup>

</m:kitodo></xmlData></mdWrap></dmdSec>
<structMap TYPE="LOGICAL"><div DMDID="metadata" TYPE="###TYPE###"/>
</structMap><structLink/></mets>

The folder contains the subfolders (according to order). The process title may yet be unknown (or may be different after import), and is replaced by the import function during import, after the process has been created.

📁 box24
   🖹 meta.xml

📁 box24_doc1
   🖹 meta.xml
   📁 images
      📁 NN_media
        🖻 00000001.jpg

📁 box24_doc2
   🖹 meta.xml
   📁 images
      📁 NN_media
        🖻 00000001.jpg

Hierarchical processes must contain the <fptr> links to the children. They may contain a relative path, or the database ID reference from the source system, when it corresponds to the folder name.

<!-- ... -->
<structMap TYPE="LOGICAL"><div DMDID="metadata" TYPE="###TYPE###">

<div><mptr xlink:href="../box24_doc1/meta.xml"/></div>
<div><mptr xlink:href="../box24_doc2/meta.xml"/></div>

</div></structMap></mets>

The import function will add missing parts of the meta.xml, as reading in images, but keeping information that is already in place. The source data must be mounted on the server side. The import function can be started as a KitodoScript. Example:

action:importProcesses templateid:123 source:/mnt/usb01/metadata/*

It will run as a long-running task in the Task Manager.