kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
GNU General Public License v3.0
62 stars 63 forks source link

Naming of sub directories on export #4752

Open henning-gerhardt opened 2 years ago

henning-gerhardt commented 2 years ago

Exporting a process in current Kitodo.Production version results in a different directory sub structure like in 2.x.

In 2.x the following directory structure was created on an export with images and ocr data

 - <processtitle>*.xml 
 - <processtitle>_ocr/ (only created if there any ocr data)
 - <processtitle>_tif/ (containing the image files)

In 3.x this structure is created

 - <processtitle>.xml
 - ocr/<processtitle>_<ocr format>/ (created even if there is no ocr data and ocr_format could be alto or tei, depending on used ocr system)
 - images/scans_tif/ (containing the image files)

The folder configuration looks like

OCR (ALTO) Screenshot_2021-10-14_09-32-36

LOCAL Screenshot_2021-10-14_09-33-10

It looks for me that the folder names of the internal structure are taken one to one on export.

Is there a way to change the naming of the exported directories?

matthias-ronge commented 2 years ago

In 3.x, on export, a folder is created for the process, and inside is copied any folders that are set β€œcopy on export” in the folder settings of the project. There is currently not a way to change this.

So, from metadata directory:

πŸ“ metadata/
  πŸ“ 42/
    πŸ“ images/
      πŸ“ 1234567X_tif/  [SOURCE]
    πŸ“ jpgs/
      πŸ“ default/       [DEFAULT]
      πŸ“ thumbs/        [THUMBS]
    πŸ“ ocr/
      πŸ“ alto/          [TXT]
      πŸ“ pdf/           [PDF]
    πŸ“„ meta.xml

If you choose DEFAULT, THUMBS and PDF to be exported, you will get in the hotfolder:

πŸ“ hotfolder/
  πŸ“ 1234567X/
    πŸ“ jpgs/
      πŸ“ default/       [DEFAULT]
      πŸ“ thumbs/        [THUMBS]
    πŸ“ ocr/
      πŸ“ pdf/           [PDF]
    πŸ“„ 1234567X.xml

There is currently no way to configure a different name for the folder in the hotfolder, they will be the same as in the process directory. There is currently no way not to copy a folder if it is empty, either.

To change the names of the exported folders, you have to change the names of the internal folders.

henning-gerhardt commented 2 years ago

Changing the name of the internal folder is not really possible as then I must change this for over 480.000 processes.

So I need time to adjust the post processing application which providing the processed data to our presentation system for this new "requirements".