kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

File pointer in exported METS get duplicated if files from different file groups have the same filepath #4916

Open BartChris opened 2 years ago

BartChris commented 2 years ago

I am not sure wether this is intended behaviour, but consider the following constellation:

I have defined three filegroups: ORIGINAL, DEFAULT, THUMBS. Images of the ORIGINAL filegroup are stored as TIFF, DEFAULT and THUMBS as JPEG. For none of those filegroups a METS-PATH is defined in the project settings, see for example the settings for the ORIGINAL filegroup.

grafik

In the internal meta.xml everything looks fine. But when the process gets exported, the physical structure section is duplicating the UUID of the thumbnail. It seems like the system can differentiate between TIFF images (ORIGINAL) and the JPEG files (DEFAULT, THUMBS). But not between DEFAULT and THUMBS jpegs.

grafik

The default jpeg file is encoded like that:

grafik

the thumb jpeg is encoded like that:

grafik

If i then define individual METS-paths for each filegroup, e.g. /images/tif/ , /images/jpg/ , /images/thumbs it seems to work and all images of all filegroups are referenced in the phyical structure:

grafik

There seems to be a hard requirement that during export the images in the file groups can be differentiated via their mediatype or the METS path. Or is this a bug?

matthias-ronge commented 2 years ago

I don't understand the point here. The exported METS file is supposed to reference two filesets (I mean DEFAULT and THUMBS) with two different image versions. If you don't specify a path here, it will refer to the same file (00000001.jpg) twice! Do you also use the DEFAULT image as a thumbnail?

It is not yet defined what should happen if a fileset without a specified METS pointer path is to be exported. In order for the DFG Viewer to work, the file group must describe the absolute path to the images, as they can subsequently be retrieved from a server by the DFG Viewer (i.e. in the external view). The requirement here is that the thumbnail must be 150 pixels in size and the DEFAULT file must have the original resolution, so two different files must be referenced here.

If you don't want to use the DFG viewer, you're free here, but what information do you want to encode here?

I agree that this is probably a bug nonetheless. Apparently the UUID is looked up using the file name.

BartChris commented 2 years ago

Thank your for the detailed answer. You are of course correct that in this form the Filegroups do not make any sense. So let me explain how i got there. I had the problem that i could not use the interface in Kitodo

grafik

to define the link structure i was striving for. I tried to put URLs there which follow the standard of a typical IIIF-compliant image server:

https://iiif.io/api/image/3.0/

{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

It appears to me as if the interface in kitodo allow the definition of a base url and the file path gets appended after that. In a typical IIIF-URL the specification on image size comes after the identifier. So i tried to generate the path outisde of Kitodo (XSLT or in custom scrtipt). For that it is the easiest for me to keep the paths as just 00000001.jpg and convert that to the image server path i need.

matthias-ronge commented 2 years ago

Full support of an image server is topic of issue #3371, however, do I understand you correctly that here it would need some placeholder for the image name?

https://server.example/imageserver/(filename)/de-DE/100/0/100.jpg
                                   ^^^^^^^^^^

Maybe there should be some variables, for relative path (jpgs/max/00000001.jpg), just filename (00000001.jpg), filename without extension (00000001) and „canonical part“ (same, except for some special set-ups).

BartChris commented 2 years ago

The implementation of the ideas discussed in https://github.com/kitodo/kitodo-production/issues/3371 would be great. I suppose those mostly adress the usage of an image server inside of kitodo. As this is not possible right now we definiteley need the derivatives inside Kitodo at this point. But in our presentation we would like to use an image server to deliver different resolutions.

I am not entirely sure, if the construction of the URL for the image server can (under the current conditions in Kitodo) be constructed from inside of Kitodo since information from outside of Kitodo might be necessary. In our case we plan to use the Kitodo project title and the the name of the image (without the extension). So something like that

https://server.example/imageserver/(projecttitle)_(filename_without_extension)/full/100/0/default.jpg

This might be possible. But i am not sure wether you can cover every use case of different users. The addition of the filename as variable would however be great.

matthias-ronge commented 2 years ago

You can still apply post-processing on your URLs then, just like

https://server.example/imageserver/zzzREPLACEMELATERzzz_(filename_without_extension)/full/100/0/default.jpg