kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 63 forks source link

Newspaper Calendar Function - Exported METS does not include "type" for "month" and "day" #4362

Open stefanCCS opened 3 years ago

stefanCCS commented 3 years ago

Using Version 3.2.1-build 24.03.21 Update: Also the same with official Release 3.3!

I am using calendar newspaper function to create processes. To do so I have in my rule set:

       <division id="newspaper" processTitle="MODSrecordIdentifier" use="createChildrenWithCalendar" withWorkflow="false"> 
           <label>newspaper</label>
            <label lang="de">Zeitung</label>
            <subdivisionByDate>
                <division dates="ORDERLABEL" id="year" scheme="yyyy" processTitle="+'-'+#YEAR" withWorkflow="false"/>
                <division dates="ORDERLABEL" id="month" scheme="yyyy-MM"/>
                <division dates="ORDERLABEL" id="day" scheme="yyyy-MM-dd"/>
            </subdivisionByDate>
        </division>

        <division id="issue"> <!-- DFG-Viewer compatible -->
            <label>Newspaper (one Issue)</label>
            <label lang="de">Zeitung (eine Ausgabe)</label>
        </division>

        <restriction division="newspaper" unspecified="forbidden">
            <permit division="year"/>
            <permit key="MODSrecordIdentifier" maxOccurs="1" minOccurs="1"/>
            <permit key="MODStitle" minOccurs="1"/>
        </restriction>
        <restriction division="year" unspecified="forbidden">
            <permit division="month"/>
            <permit key="ORDERLABEL" minOccurs="1"/>
        </restriction>
        <restriction division="month" unspecified="forbidden">
            <permit division="day"/>
            <permit key="ORDERLABEL" minOccurs="1"/>
        </restriction>
        <restriction division="day" unspecified="forbidden">
            <permit division="issue"/>
            <permit key="ORDERLABEL" minOccurs="1"/>
        </restriction>
        <restriction division="issue" unspecified="forbidden">
            <permit key="LABEL" />            
            <permit key="MODStitle" minOccurs="1"/>

The process generation function runs fine - it creates hierarchical processes as follows:

The DMS Export of an issue creates exported METS for this issue, its according year and the title - so far, so good. The exported METS for the title has mptr references to the "year" METS files - also good. The exported "year" METS has full logical hierarchy (title, year,month,day, issues), with the issues having mptr references to the "issue" METS - also good.

The "issue" METS also has full logical hierarchy (title, year,month,day, issues). BUT: For "month" and "day" the attribute TYPE is not available Example:

   <mets:structMap TYPE="LOGICAL">
      <mets:div ID="uuid-0c32fd94-633f-4b94-9492-e4e7d46ebb19" TYPE="newspaper">
         <mets:mptr xlink:href="http://dfgviewer.cloutodo.de/xyz/202104271421/202104271421.xml"
                    LOCTYPE="URL"/>
         <mets:div ID="uuid-fc4a56c1-719a-48ad-9118-15c9db3c9f65"
                   TYPE="year"
                   ORDERLABEL="2021">
            <mets:mptr xlink:href="http://dfgviewer.cloutodo.de/xyz/202104271421-2021/202104271421-2021.xml"
                       LOCTYPE="URL"/>
            <mets:div ID="uuid-78942ed7-e9be-4bfc-874f-03002bf588f2"
                      ADMID="uuid-fcb814c4-0269-3469-ba87-10a8a23e1662">
               <mets:div ID="uuid-c0873b96-708a-4ded-9e30-4ece9b4fd6c7" ORDERLABEL="2021-02-01">
                  <mets:div ID="uuid-4bbfe98a-2eb3-4c1c-8452-4a18af9ba685"
                            DMDID="uuid-196280cc-a908-3292-87d6-516740879349"
                            TYPE="issue"
                            ORDER="1"/>
               </mets:div>
            </mets:div>
         </mets:div>
      </mets:div>
   </mets:structMap>

=> This is NOT dependent of XSLT processing during export (the example above is based on simple XSLT-copy-All).

==> please check this behavor, as it has (e.g.) a negative side effect showing this in DFG-Viewer.

matthias-ronge commented 3 years ago

From a logical perspective, what has been described is correct. The day and month as a logical unit are located in the annual METS file, and only there. In the output METS file there are containers that trace the structure, but these are not the day or month. This is also how the data is saved internally, and it is output without change.

Question: If you need the type attributes with these values at this point, can you add them via XSLT?

If not, the program code would have to be adapted so that the type values are copied when exporting.

stefanCCS commented 3 years ago

I assume, that it might be possible (even not very easy) to put this TYPE attibute for "day" and "month" during XSLT. BUT, I would prefer very much to do this in the software program itself (during process generation), because 1) in my opinion it is a general feauture needed by everybody, as having this TYPE attribute is a clear requirement by the DFG-Viewer 2) It also looks "strange" in meta data editor ("ohne Typ", see below) as it is today. To create this correctly during process creation would improve also this behavior. grafik

matthias-ronge commented 3 years ago

Let me open a parenthesis here. ❪ The representation in the metadata editor is not yet very nice. The structures from three METS files are displayed one below the other (you can also see that: no indentation, each starts again at the left edge).

drei untereinander

This is first of all so that you can see the higher-level processes structures at all, so that you can see what the process belongs to. However, that's not nice, and I think the goal here should be the aggregated and nested representation, something like this:

Aggregated view

The nameless container elements should not be displayed at all, because they are only of a technical nature in order to manage it internally. The question of representation is, however, a separate topic and independent of the question of export. ❫

Why your suggestion is not a good one: There is a rule in programming for good software that says that every piece of information should only be stored in a single place if possible. It's a question of consistency: if you change it in one place, you have to change all the other places as well, or you get an inconsistency, which you want to avoid. The other way around, if the program finds a state in which there are different values ​​for the same value in different files, which is the case? Therefore, it would be best not to do that and only when exporting is the export format generated that is required, and if values ​​have to be copied to other locations, this should be done there.

Imagine that you had these copies of values, then you would have to change all subordinates if you change the superordinate order (this would affect series, for example). Yes, you can do it. But then, you have to make sure that, at the same time, no other user is working on the subordinates. Initially, we wanted to go just that way and we had programmed a very complicated system for this in the past and then removed it because it was too complicated to get managed with it.

If this is not possible in XSLT (main question is: Can XSLT access the values from parent METS file?) then we need to change the export.

stefanCCS commented 3 years ago

Many thanks for this explanations (and the additional information given via Skype). I have understood, that the current design allows a very flexible logical structure, which are not limited to "normal" Newspaper/Year/Month/Day/Issue, but can be also something like /Newspaper/Year/Week/Issue ,... (and many more...). Therefore, during the creation of the "issue" process, the "root-div-level" is a kind of container and per definiton not always a "Month". The means as result the first possibility to correct this is during export (which means typically via XSLT).

==> even, if I understand this, I would appreciated very much, if

stefanCCS commented 3 years ago

I think,I could manage to add XSLT to generate in export missing "month" and "day" TYPE in 'issue'-METS (to have it better visible in DFG-Viewer):

    <xsl:template match="/mets:mets/mets:structMap[@TYPE='LOGICAL']/mets:div/mets:div/mets:div/mets:div[not(@TYPE)][mets:div[@TYPE='issue']]">
        <xsl:copy>
            <xsl:attribute name="TYPE">day</xsl:attribute>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>    
    <xsl:template match="/mets:mets/mets:structMap[@TYPE='LOGICAL']/mets:div/mets:div/mets:div[not(@TYPE)][mets:div/mets:div[@TYPE='issue']]">
        <xsl:copy>
            <xsl:attribute name="TYPE">month</xsl:attribute>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>    
stefanCCS commented 3 years ago

Hi, here is an even improved version of this XSLT (adding also ORDERLABEL for "month" in "issue"-METS):

    <xsl:variable name="orderlabel" select="/mets:mets/mets:structMap[@TYPE='LOGICAL']/mets:div/mets:div/mets:div/mets:div/@ORDERLABEL"/>

    <xsl:template match="/mets:mets/mets:structMap[@TYPE='LOGICAL']/mets:div/mets:div/mets:div/mets:div[not(@TYPE)][mets:div[@TYPE='issue']]">
        <xsl:copy>
            <xsl:attribute name="TYPE">day</xsl:attribute>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>    
    <xsl:template match="/mets:mets/mets:structMap[@TYPE='LOGICAL']/mets:div/mets:div/mets:div[not(@TYPE)][mets:div/mets:div[@TYPE='issue']]">
        <xsl:copy>
            <xsl:attribute name="TYPE">month</xsl:attribute>
            <xsl:if test="$orderlabel"> 
              <xsl:attribute name="ORDERLABEL"><xsl:value-of select="substring($orderlabel,1,7)"/></xsl:attribute>            
            </xsl:if>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>