Closed andre-hohmann closed 4 months ago
In my opinion, this is a severe problem, because inconsistent internal metadata is created, which
@solth : According to @henning-gerhardt, i should inform you explicitly, because for me this is a blocker for the productive use of the affected versions in SLUB Dresden. Since I was informed about this behavior by Leipzig University Library, it seems to be relevant for other institutions as well.
The additional element processTitle
for all document types could possibly lead to considerable adjustments to the export XSLT in SLUB Dresden - regardless of the dmdSec
in which the element is inserted.
The reason for is the derivation of the values from the elements processTitle
and CatalogIDDigital
for the creation of the content of several MODS fields.
processTitle
for the document types newspaper issues, newspaper year, ephemera year and ephemera issues. CatalogIDDigital
for the other document types, as for example monograph, manuscript, volume, ...If both elements are present in all METS files, the existing rules must be adapted, as a new combination exists now: processTitle
AND CatalogIDDigital
. Until now, only the combination processTitle
OR CatalogIDDigital
exists.
I wonder if it is possible to make the creation of processTitle
for monographs, ... configurable to avoid adjustments in the XSLT.
For SLUB Dresden, the metadata element processTitle` is not necessary for most document types. As all of our current processes for monographs, manuscripts, ... do not contain the element, i cannot imagine a scenario to apply it.
In my opinion, the problem lies in explicitly calling addAllowedMetadataRecursive
for the physical structure here.
When i comment out this line
addAllowedMetadataRecursive(workpiece.getPhysicalStructure(), keysForProcessTitle, processTitle);
the described generation of an additional DMDSec does not occur. If we wish to maintain this functionality, we should definitely avoid attaching metadata to the wrapper element for the actual pages, which is what happens here. Additionally, I am not even sure why we need to call that method for the physical structure here at all. Is there really a need to add the process title as metadata to a page, for example? I have already noted the problematic effects of accidentally adding metadata to pages here: https://github.com/kitodo/kitodo-production/issues/4921. The method in question is also called only during the creation of the process when (usually) no pages exist.
So i would only call
addAllowedMetadataRecursive(workpiece.getLogicalStructure(), keysForProcessTitle, processTitle);
and not
addAllowedMetadataRecursive(workpiece.getPhysicalStructure(), keysForProcessTitle, processTitle);
@andre-hohmann I added the blocking
label to signal its severity.
@BartChris I think there are definitely cases where adding metadata for individual pages is required, for example information about people depicted only in some images in a process.
@matthias-ronge can you see whether the proposed fix would pose any unwanted side effects? Is there a reason for calling the code line in question for the physical structure?
@solth : thanks a lot! Maybe, for other institutions this issue might not be blocking, so both the issue and the label can be discussed. I would be very interested in the opinion of other users of Kitodo.Production.
I wanted to point out that the current implementation can lead to increased efforts for the export XSLT.
I have not examined this in detail, but the main problem is that processTitle
is always added. The deletion of the dmdSec
via XSLT is difficult, because the DMDID
in <mets:structMap TYPE="PHYSICAL">
needs to be deleted, too. Even if you apply other scripts, these efforts are in my opinion not really necessary. The export itself is complicated enough.
If the processTitle
is exported to avoid the deletion of dmdSec
, you probably will receive warnings from the DDB analysis report, because the dmdSec
does not contain any title information.
So, this might be complicated and for SLUB Dresden, i would first analyse the possible consequences and i want to avoid as many "repairs" as possible. For the other issue, it is clear that a wrong process title is assigned to superordinate processes:
I agree that it can/should be possible to apply metadata for a specific page, but these metadata is probably assigned on purpose and does not lead to "unwanted" dmdSec
. Thus, this does not cause any problems or at least, i cannot imagine any.
In addition: The concept seems inconsistent to me because processTitle
for newspaper issues, newspaper year, ... is not allocated to <mets:structMap TYPE="PHYSICAL">
, but to <mets:structMap TYPE="LOGICAL">
.
Here is short update with some new insights:
1 After consultation with @subhhwendt , i have learned, that this behavior only occurs, if processTitle
is configured in the ruleset. As far as i know, this is only necessary, if processes for newspapers are created.
Probably, only institutions are affected, which create processes for newspaper and non-newspaper-materials.
2 Additionally irritating is, that the processTitle
in the new dmdSec
is not shown in the metadata editor and cannot be deleted or corrected there. If it could be deleted, it would be an option to solve the problem.
Here is another suggestion to reduce the effect of the functional extension of key="processTitle"
- in addition, that no additional dmdSec
is created.
key="processTitle"
should only be created, if it is explicitly assigned in the rule set to the document type or structure element.
Then, for each institution it can be decided by demand, if the metadata is created or not.
Example:
<restriction division="NewspaperYear" unspecified="forbidden">
<permit division="NewspaperMonth"/>
<permit key="LABEL" maxOccurs="1"/>
<permit key="ORDERLABEL" maxOccurs="1"/>
<permit key="processTitle" maxOccurs="1"/>
</restriction>
If I understand correctly, this is just a configuration problem? If a metadata key has been marked as use="processTitle"
and this metadata key is allowed in the ruleset for a division, then the process title is written there while the process is created. So far so logical.
So there are two options:
use="processTitle"
at all. Then the function is off.In this specific case, perhaps only one rule is missing in the ruleset, which does not allow metadata keys on the physSequence
division:
<restriction division="physSequence" unspecified="forbidden">
<permit division="page"/> <!-- paged media -->
<permit division="track"/> <!-- a/v media -->
<permit division="other"/> <!-- other media (like 3D objects) -->
<!-- permit division="partitialOrder"/ --> <!-- possibly other divisions
if you work with two
structure trees / partial
orders -->
<!-- NO METADATA keys at all! -->
</restriction>
This would probably require the implementation of https://github.com/kitodo/kitodo-production/issues/4097
We definitely need documentation for the configuration of page
and physSequence
(once implemented) in the ruleset. Their absence can lead to unintended consequences if not properly configured. The issue lies not only in the lack of documentation but also in the fact that many institutions may not require the assignment of metadata elements to the physical structure. This aspect of adding metadata to the physical structure is also is not addressed in the METS-Anwendungsprofil
.
So the standard use case might never need this. While having the option is beneficial, improper configuration may result in unwanted DMDSecs. Therefore, it's crucial to ensure correct configuration by giving out this info in the documentation to avoid such issues.
I absolutely agree with @BartChris regarding the documentation.
We need "processTitle" for newspaper/ephemera processes (year and issues) and i only allow it for those division, but "processTitle" is still assigned to all processes.
Maybe the problem lies within the missing physSequence
division in our ruleset.
Thanks for the hint! I will check this as soon as possible.
@andre-hohmann If i understand this issue correctly it is impossible right now to define rules for the physSequence
.
https://github.com/kitodo/kitodo-production/issues/4097
I have an additional question: Where in the UI would the user be able to edit the metadata for the element physSequence
even if we make it configurable via the ruleset? And what kind of metadata would be typically added to the physSequence
? I am still struggling seeing a real use case for that.
@BartChris :
If i understand this issue correctly it is impossible right now to define rules for the physSequence. https://github.com/kitodo/kitodo-production/issues/4097
Thanks! Then, i misunderstood @matthias-ronge hint:
Unfortunately, i will not be able to test changes of the configuration files.
Or could it be, that in #4097 the aim is to create a use
-attribute to define physSequence
as root of the structure tree?
And what kind of metadata would be typically added to the physSequence? I am still struggling seeing a real use case for that.
I have neither an idea nor a use case for that, too.
I have an additional question: Where in the UI would the user be able to edit the metadata for the element physSequence even if we make it configurable via the ruleset?
Again, i have no idea and i think this should be worked out by the ones who wants to add metadata to the physSequence. Of course, this should be done in a way, that does not influence the current behavior in a negative way for the others users.
To avoid misunderstandings:
I would be glad, if processTitle
would be in general applied to <mets:structMap TYPE="LOGICAL">
.
If other ones need it in physSequence
, i am fine with it, as long as i do not have to apply it.
For me, it seems, as if the concept of implementation is not complete and/or the documentation is missing.
Probably, this should be discussed in a virtual conference, to find a solution.
And we have definitely adjust ruleset_default.xml
, because with the current version, the described behavior occurs.
In this specific case, perhaps only one rule is missing in the ruleset, which does not allow metadata keys on the
physSequence
division:<restriction division="physSequence" unspecified="forbidden"> <permit division="page"/> <!-- paged media --> <permit division="track"/> <!-- a/v media --> <permit division="other"/> <!-- other media (like 3D objects) --> <!-- permit division="partitialOrder"/ --> <!-- possibly other divisions if you work with two structure trees / partial orders --> <!-- NO METADATA keys at all! --> </restriction>
I just checked. This does unfortunately not solve the issue (in Kitodo 3.6.2).
Thanks!
Then we should definitely apply your suggestion.
If the physSequence
should be applied, all necessary requirements should be available.
I just tested this: The root division of the physical structure tree has no type internally. That makes sense to a certain extent, but then the restriction must read accordingly:
<restriction division="" unspecified="forbidden">
<permit division="page"/>
<permit division="track"/>
<permit division="other"/>
</restriction>
It's definitely not intuitive, but it works for me.
Still, this is a bug; but I think it would be better to implement #4097 than to change the code here now.
I have an additional question: Where in the UI would the user be able to edit the metadata for the element
physSequence
even if we make it configurable via the ruleset? And what kind of metadata would be typically added to thephysSequence
? I am still struggling seeing a real use case for that.
Is this possible if you use two structure trees? As a non-user, of course I don't know what kind of metadata is recorded there. But maybe that makes sense. For example, the physical media type, like hardcover or paperback. Or a former sales price. Or for a video cassette, the video format NTSC, PAL or SECAM, the tape type VHS, VHS-C, S-VHS, maybe the tape length in minutes, etc. Just things that you want to capture for the entire physical medium.
I just tested this: The root division of the physical structure tree has no type internally. That makes sense to a certain extent, but then the restriction must read accordingly:
<restriction division="" unspecified="forbidden"> <permit division="page"/> <permit division="track"/> <permit division="other"/> </restriction>
It's definitely not intuitive, but it works for me.
Still, this is a bug; but I think it would be better to implement #4097 than to change the code here now.
I tested your suggestion: Adding this restriction for divisions without a type prevents the addition of an additional DMDSec, so it could be used as a workaround. If it does not have any unwanted side effects of course.
Is this possible if you use two structure trees
Are you talking about the extended Structure tree? That makes sense, but it seems like it is not possible to edit it there. I have the processTitle in the DMDSec for the physSequence
if i do not apply the ruleset restriction:
<mets:dmdSec ID="uuid-1f89f1fc-48d1-34d9-b02c-c9ba849a748b">
<mets:mdWrap>
<mets:xmlData>
<kitodo:kitodo>
<kitodo:metadata name="processTitle">retro_123456_000001</kitodo:metadata>
</kitodo:kitodo>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
The metadatum however does not appear in the editor.
I can also not add any metadatum to the "Division without type" in the editor, but this is a different issue i suppose...
We can assert the following:
There exists a workaround to prevent the creation of additional DMDSecs for the physSequence when utilizing the use="processTitle" attribute. However, if the process is generated with pages (via prestructured import), additional measures may be required to prevent the addition of the processTitle
to every page. Unfortunately, I haven't been able to test the latter scenario.
In my tests in was not possible to edit metadata assigned to the physSequence
in the extended Structure tree. So we would not really have a way to correct the unwanted DMDSECs for the user.
It's up to the institutions to determine whether this workaround is sufficient for their needs and if it would enable them to utilize Kitodo 3.6 and a released 3.7, even if the fix for #4097 is not implemented.
In my opinion, the current situation is not entirely satisfactory.
I propose that we extend the documentation and the example ruleset, advising institutions to implement the following restrictions to mitigate potential side effects. These restrictions should continue to be applied even after a fix for #4097:
<restriction division="" unspecified="forbidden">
<permit division="page"/>
<permit division="track"/>
<permit division="other"/>
</restriction>
My advice would also be to enable restrictions on pages:
<restriction division="page" unspecified="forbidden">
<permit key="ORDERLABEL" maxOccurs="1"/>
</restriction>
While Kitodo's flexibility is a significant advantage, I suggest considering whether it might be better not to allow the assignment of metadata to the physical structure (physSequence
and page
) by default. Metadata assignment to these elements should be explicitly enabled in the ruleset to avoid institutions inadvertently creating unwanted DMDSecs. A wrongly configured ruleset should not lead to side effects which might be discovered too late in the process.
Thanks a lot for the analysis and the summary!
We can assert the following:
- ...
In my opinion, the current situation is not entirely satisfactory.
I absolutely agree with you and in my opinion, this analysis should be conducted and these question should posed, before the functionality is implemented.
I'm surprised that there don't seem to be any specific use cases or requirements for this change, which causes quite a lot of effort to clarify.
As soon as possible, i will test the changes in the SLUB Dresden and hope that it will work for us as well.
I would also like to draw attention to the following issue that will probably not be resolved by adjusting the ruleset:
The workaround described in the following comment works for me as well and so far no negative side effects have been identified.
I will therefore remove the label blocking
.
I will also close the issue because @BartChris has created a discussion with the current state of knowledge and in which the next steps are to be determined.
Depending on these results, a new issue should be created in which the necessary actions are described.
Describe the bug If a new process is created, the metadata element
processTitle
is written in an own<mets:dmdSec>
in the internal METS file. This element is connected with the first<mets:div>
-element in<mets:structMap TYPE="PHYSICAL">
- instead of<mets:structMap TYPE="LOGICAL">
. This leads to some empty metadata elements in the exported METS-file. Probably, this depends on the XSLT configuration.This behavior can be observed for processes of the type manuscript, monographs, volumes, ... For processes of the type newspaper year, newspaper issue, ephemera year and ephemera issue, the metadata
processTitle
is written in the existing<mets:dmdSec>
, which is connected with<mets:structMap TYPE="LOGICAL">
.Probably, this is caused by:
4459
See also:
6025
To Reproduce Steps to reproduce the behavior:
<mets:dmdSec>
-elementsExpected behavior The metadata
processTitle
should always be written in the existing<mets:dmdSec>
, which is connected with<mets:structMap TYPE="LOGICAL">
. According to the behavior of the processes of the type newspaper, ephemera (tile, year, issue).Probably, this must be analyzed more deeply. It seems there are more consequences:
Release 3.6.2 - probably since 3.6.0
Desktop (please complete the following information):
Additional context
Internal METS file: Manuscript
``` xmlInternal METS file: Volume
``` xmlInternal METS file: Issue (Ephemera)
``` xml