kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 63 forks source link

Changes of image files are not written in METS file by "Read in pagination from images" #5285

Closed andre-hohmann closed 1 year ago

andre-hohmann commented 2 years ago

Describe the bug In Kitodo.Production 2.x, the function "Read in pagination from images" is used to adjust the METS file (<mets:fileSec>, <mets:structLink>, ...) according to the current amount of image files in the respective image folder. This is necessary, for example, after the deletion or the addition of image files. In Kitodo.Production 3.x the button exists, too (See: Beispiele), but without any functionality. Thus, it seems as if it is not possible to apply changes of images files to the METS file.

To Reproduce Steps to reproduce the behavior:

  1. Delete some image files of a specific process
  2. Generate the preview image files
  3. Open the metadata editor
  4. For the deleted image files, only empty frames are shown
  5. Still, the same amount of image files are shown in the structure tree
  6. It is not possible to adjust the METS file to the current amount of image files

Expected behavior If "Read in pagination from images" is clicked, the METS file is updated according to the current amount of image files.

Release

Desktop (please complete the following information):

solth commented 1 year ago

@andre-hohmann three questions:

  1. when you delete images via the file system, do you only delete the scans or do you also remove the png copies for thumbnails and details views manually?
  2. the button label is wrong, because it does not perform any kind of pagination, but instead adds or removes references to images to or from the METS file; what would be a more appropriate button label for this functionality in your opinion?
  3. should the button indeed update the pagination after updating the files in the METS file with file or is that not required?
andre-hohmann commented 1 year ago

@solth : Good questions! I hope my answers are understandable.

1 We want to delete only the scans (original files). Reason: In Kitodo.Production 2.x, we only delete the scans (original files) and it would be helpful, if this could be possible in Kitodo.Production 3.x, too. It would be quicker and it would be avoided that users forget to delete the preview and thumbnail files (derivative files). Furthermore, the users should have as less opportunities to manipulate data as possible to avoid mistakes and problems.

2 You are absolutely right and it would be perfect, if the label could be adjusted to:

Other proposals are welcome!

3 The pagination in ORDERLABEL should not be updated! Only the number of image files should be updated. Pagination is only lost, when the files are deleted. Use cases:

It must be avoided that the intellectual created pagination information is lost. But it is not expected that the pagination is automatically adjusted to the changes of the files. Thus, after the correction of the files, manual efforts are needed to adjust the pagination correctly. The remaining pagination information supports the corrections.

solth commented 1 year ago

@andre-hohmann thanks for the response.

Just to avoid any misunderstandings:

  1. does this mean any other files corresponding to a removed scan - e.g. thumbnail, full resolution png derivate, pdf etc. - should be deleted automatically when this button is clicked? If so, this should probably not happen if the manually deleted file belongs to a different file group than the one which has been set as "Image generation source" in the project settings, correct? (if I am not mistaken, for each page the metadata editor will generally expect to find a "media variant" for each file group configured in the project. Is that correct, @matthias-ronge ?)
  2. That would work, I guess, but as far as I understand the main goal is to adjust the METS file with the files actually found in the metadata sub folder belonging to the current process; so perhaps this METS adjustment should be reflected in the button label?
  3. Here we have to be careful that we do not misunderstand each other. Before and after the use cases you stated "The pagination in ORDERLABEL should not be updated!" and "It must be avoided that the intellectual created pagination information is lost. But it is not expected that the pagination is automatically adjusted to the changes of the files. Thus, after the correction of the files, manual efforts are needed to adjust the pagination correctly. The remaining pagination information supports the corrections." respectively. At first glance this reads as a contradiction to the use case description to me. In Use Case 2, for example, you wrote: "The pagination of the "old" file 41 is applied to the "new" file 41 (that is the "old" file 46) ...The pagination of the files 41-80 has to be checked and if necessary adjusted." Do you mean this new pagination is applied and adjusted manually, so not part of the functionality behind the button? I think it is very important that we document this as clearly as possible.

If I understand you correctly, pressing the (then newly labeled) button after deleting some images should then:

Clicking the button will not:

Is this correct?

andre-hohmann commented 1 year ago

1 Delete images via the file system

We have a misunderstanding here, because nothing should be deleted, when the button is clicked.

When we delete image files in the file system, we proceed as follows:

  1. Task is set to "Scan" -> The link to the home directory is set
  2. The images files are deleted, added, ... in the file system and the checksum file is created
  3. Task "Scan" is finished (or the correction is solved) -> The link to the home directory is removed
  4. The image files are checked for validity according to the long term preservation
  5. The derivative files are generated according to to the original files
  6. The process is opened in the metadata editor with the derivative files

When the button is clicked, the METS-file should be only updated according to the current amount of original image files - the ones from which the derivative files are created. The number should be the same for the generated image files.

andre-hohmann commented 1 year ago

2 Button label As discussed: we just use the proposed labels.

3 Update pagination

Clicking the button will not:

andre-hohmann commented 1 year ago

@solth , @henning-gerhardt : Thanks a lot for the discussion of this complex issue. I wrote some results as answers to Arveds questions. I hope that they are correct and understandable. If not, please correct - we could then avoid misunderstandings in the future.

solth commented 1 year ago

In a meeting with @andre-hohmann and @henning-gerhardt we agreed on the following scope for this issue:

@andre-hohmann & @henning-gerhardt please add/correct any points I might have forgotten or not documented correctly here

solth commented 1 year ago

I forgot one point we discussed: changes to the workpiece are only to be written to the meta.xml file when the user explicitly clicks the Save button, not implicitly after the action behind the now to be relabeled button is completed!

andre-hohmann commented 1 year ago

@solth : Thanks a lot for the summary! I have nothing to add/correct. Since your summary and my answers match, I think we have the same understanding regarding how to proceed.

andre-hohmann commented 1 year ago

@solth :

the button label is wrong, because it does not perform any kind of pagination, but instead adds or removes references to images to or from the METS file; what would be a more appropriate button label for this functionality in your opinion?

@henning-gerhardt am me would suggest the following label for the button: