kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

Naming of sub directories on export #4752

Open henning-gerhardt opened 2 years ago

henning-gerhardt commented 2 years ago

Exporting a process in current Kitodo.Production version results in a different directory sub structure like in 2.x.

In 2.x the following directory structure was created on an export with images and ocr data

<processtitle>/
 - <processtitle>*.xml 
 - <processtitle>_ocr/ (only created if there any ocr data)
 - <processtitle>_tif/ (containing the image files)

In 3.x this structure is created

<processtitle>
 - <processtitle>.xml
 - ocr/<processtitle>_<ocr format>/ (created even if there is no ocr data and ocr_format could be alto or tei, depending on used ocr system)
 - images/scans_tif/ (containing the image files)

The folder configuration looks like

OCR (ALTO) Screenshot_2021-10-14_09-32-36

LOCAL Screenshot_2021-10-14_09-33-10

It looks for me that the folder names of the internal structure are taken one to one on export.

Is there a way to change the naming of the exported directories?

matthias-ronge commented 2 years ago

In 3.x, on export, a folder is created for the process, and inside is copied any folders that are set β€œcopy on export” in the folder settings of the project. There is currently not a way to change this.

So, from metadata directory:

πŸ“ metadata/
  πŸ“ 42/
    πŸ“ images/
      πŸ“ 1234567X_tif/  [SOURCE]
    πŸ“ jpgs/
      πŸ“ default/       [DEFAULT]
      πŸ“ thumbs/        [THUMBS]
    πŸ“ ocr/
      πŸ“ alto/          [TXT]
      πŸ“ pdf/           [PDF]
    πŸ“„ meta.xml

If you choose DEFAULT, THUMBS and PDF to be exported, you will get in the hotfolder:

πŸ“ hotfolder/
  πŸ“ 1234567X/
    πŸ“ jpgs/
      πŸ“ default/       [DEFAULT]
      πŸ“ thumbs/        [THUMBS]
    πŸ“ ocr/
      πŸ“ pdf/           [PDF]
    πŸ“„ 1234567X.xml

There is currently no way to configure a different name for the folder in the hotfolder, they will be the same as in the process directory. There is currently no way not to copy a folder if it is empty, either.

To change the names of the exported folders, you have to change the names of the internal folders.

henning-gerhardt commented 2 years ago

Changing the name of the internal folder is not really possible as then I must change this for over 480.000 processes.

So I need time to adjust the post processing application which providing the processed data to our presentation system for this new "requirements".