keeps / commons-ip

Commons IP is project that provide a command-line tool and Java Library to validate and manipulate E-ARK Information Packages, so to create or process E-ARK SIP and AIP and also validate them against official specifications.
http://keeps.github.io/commons-ip/
GNU Lesser General Public License v3.0
11 stars 14 forks source link

The validator does not allow "./" in the attribute "@xlink:href" #278

Closed spacid closed 1 month ago

spacid commented 5 months ago

Hi,

Currently, it is not allowed to use ./ as a relative path in the attribute @xlink:href.

A (non-exhaustive) snippet from the root METS.xml to illustrate:

    <!-- ref to descriptive metadata about IE: MODS format -->
    <dmdSec ID="uuid-a75bdaf4-f9a8-49aa-9e92-dbe5ad164115">
        <mdRef LOCTYPE="URL" MDTYPE="MODS" xlink:type="simple" xlink:href="./metadata/descriptive/mods.xml" MIMETYPE="text/xml" SIZE="2806" CREATED="2023-11-16T00:00:00+02:00" CHECKSUM="79346c8fa8e4733199a738c7a518937d" CHECKSUMTYPE="MD5"/>
    </dmdSec>

    <!-- ref to the PREMIS metadata about IE/subIE(s)/package -->
    <amdSec>
        <digiprovMD ID="uuid-c47c6d69-7478-4e7b-8099-3e9d020f4140">
            <mdRef LOCTYPE="URL" MDTYPE="PREMIS" xlink:type="simple" xlink:href="./metadata/preservation/premis.xml" MIMETYPE="text/xml" SIZE="6606" CREATED="2023-11-16T00:00:00+02:00" CHECKSUM="08412c3c14471b2335a4b322611313ce" CHECKSUMTYPE="MD5"/>
        </digiprovMD>
    </amdSec>

    <!-- file section -->
    <fileSec ID="uuid-db9392e5-726f-4dbd-bb90-d1e65ef24392">

        <fileGrp USE="Representations/representation_1" ID="uuid-b11d435b-95dc-4474-838b-b63e54daea37">
            <file ID="uuid-dff65401-5270-43b1-ae7d-040bbf929d06" MIMETYPE="text/xml" SIZE="4471" CREATED="2023-11-16T00:00:00+02:00" CHECKSUM="18027a1bb8d57d171d765d8fd45949e7" CHECKSUMTYPE="MD5">
                <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="./representations/representation_1/METS.xml"/>
            </file>
        </fileGrp>
        <fileGrp USE="Representations/representation_2" ID="uuid-967f7420-74da-48eb-93e8-7a431ed9ca8e">
            <file ID="uuid-151ff14f-86ee-4ecc-b43d-d460b570c252" MIMETYPE="text/xml" SIZE="4482" CREATED="2023-11-16T00:00:00+02:00" CHECKSUM="d1acad2d4e1843a5f6fbbcd739fe23d0" CHECKSUMTYPE="MD5">
                <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="./representations/representation_2/METS.xml"/>
            </file>
        </fileGrp>
        <fileGrp USE="Representations/representation_3" ID="uuid-e180851d-e7b9-4193-87b3-151ab3044a22">
            <file ID="uuid-f3c176e5-99fb-4f9d-9e81-9c3f2322468c" MIMETYPE="text/xml" SIZE="11435" CREATED="2023-11-16T00:00:00+02:00" CHECKSUM="ee801ece27b7a9bd49b8cfb996eef2ed" CHECKSUMTYPE="MD5">
                <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="./representations/representation_3/METS.xml"/>
            </file>
        </fileGrp>

    </fileSec>

    <structMap ID="uuid-6f265a7d-214e-443d-b7ae-e3b6438bbf37" TYPE="PHYSICAL" LABEL="CSIP">
        <div ID="uuid-393197ad-a832-4ef5-93e8-a2c33ee7dc1d" LABEL="NEWSPAPER">
            <div ID="uuid-28889c30-a5f6-4bfb-8e41-b13145a7a4c8" LABEL="Metadata" ADMID="uuid-c47c6d69-7478-4e7b-8099-3e9d020f4140" DMDID="uuid-a75bdaf4-f9a8-49aa-9e92-dbe5ad164115"/>
            <div ID="uuid-73a25f03-22c6-4b52-8273-e80056b0cd17" LABEL="Representations/representation_1">
                <mptr xlink:type="simple" xlink:href="./representations/representation_1/METS.xml" LOCTYPE="URL" xlink:title="uuid-dff65401-5270-43b1-ae7d-040bbf929d06"/>
            </div>
            <div ID="uuid-8ea4b55c-0823-4ec0-a8ad-fa7f3a530793" LABEL="Representations/representation_2">
                <mptr xlink:type="simple" xlink:href="./representations/representation_2/METS.xml" LOCTYPE="URL" xlink:title="uuid-151ff14f-86ee-4ecc-b43d-d460b570c252"/>
            </div>
            <div ID="uuid-844099fa-b39e-4598-97da-9f97f5edb05d" LABEL="Representations/representation_3">
                <mptr xlink:type="simple" xlink:href="./representations/representation_3/METS.xml" LOCTYPE="URL" xlink:title="uuid-e180851d-e7b9-4193-87b3-151ab3044a22"/>
            </div>
        </div>
    </structMap>

This generates quite some errors:

[
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP17",
    "name": "Descriptive metadata",
    "location": "mets/dmdSec",
    "description": "Must be used if descriptive metadata for the package content is available. Each descriptive metadata section ( <dmdSec> ) contains a single description and must be repeated for multiple descriptions, when available. It is possible to transfer metadata in a package using just the descriptive metadata section and/or administrative metadata section.",
    "cardinality": "0..n",
    "level": "SHOULD",
    "testing": {
      "outcome": "FAILED",
      "issues": [],
      "warnings": [
        "There are descriptive files not referenced: /metadata/descriptive/mods.xml,  and -2 more in Root METS.xml"
      ],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP66",
    "name": "File",
    "location": "mets/fileSec/fileGrp/file",
    "description": "The file group ( <fileGrp> ) contains the file elements which describe the file objects.",
    "cardinality": "1..n",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "You have files in SIP does not referenced in Root METS.xml"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP24",
    "name": "Resource location",
    "location": "mets/dmdSec/mdRef/@xlink:href",
    "description": "The actual location of the resource. This specification recommends recording a URL type filepath in this attribute.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/dmdSec/mdRef/@xlink:href uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/descriptive/mods.xml in Root METS.xml does not exist"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP110",
    "name": "Resource location",
    "location": "mets/structMap/div/div/mptr/@xlink:href",
    "description": "The actual location of the resource. We recommend recording a URL type filepath within this attribute.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/structMap/div/div/mptr/@xlink:href  uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./representations/representation_1/METS.xml doesn't exists (in Root METS.xml)"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP43",
    "name": "File checksum",
    "location": "mets/amdSec/digiprovMD/mdRef/@CHECKSUM",
    "description": "The checksum of the referenced file.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/dmdSec/mdRef/@CHECKSUM 08412c3c14471b2335a4b322611313ce in Root METS.xml and size of file (uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/preservation/premis.xml) isn't equal"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP41",
    "name": "File size",
    "location": "mets/amdSec/digiprovMD/mdRef/@SIZE",
    "description": "Size of the referenced file in bytes.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/dmdSec/mdRef/@SIZE 6606 in Root METS.xml and size of file (uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/preservation/premis.xml) isn't equal"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP38",
    "name": "Resource location",
    "location": "mets/amdSec/digiprovMD/mdRef/@xlink:href",
    "description": "The actual location of the resource. This specification recommends recording a URL type filepath within this attribute.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/amdSec/digiprovMD/mdRef/@xlink:href (uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/preservation/premis.xml) doesn't exists (Root METS.xml)"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification": "CSIP-2.1.0",
    "id": "CSIP29",
    "name": "File checksum",
    "location": "mets/dmdSec/mdRef/@CHECKSUM",
    "description": "The checksum of the referenced file.",
    "cardinality": "1..1",
    "level": "MUST",
    "testing": {
      "outcome": "FAILED",
      "issues": [
        "mets/dmdSec/mdRef/@CHECKSUM 79346c8fa8e4733199a738c7a518937d in Root METS.xml and checksum of file (uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/descriptive/mods.xml) isn't equal"
      ],
      "warnings": [],
      "notes": []
    }
  },
  {
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP27",
    "name" : "File size",
    "location" : "mets/dmdSec/mdRef/@SIZE",
    "description" : "Size of the referenced file in bytes.",
    "cardinality" : "1..1",
    "level" : "MUST",
    "testing" : {
      "outcome" : "FAILED",
      "issues" : [ "mets/dmdSec/mdRef/@SIZE 2806 in Root METS.xml and size of file (uuid-f82f9e39-3760-4dcd-8f3d-7e482f231988/./metadata/descriptive/mods.xml) isn't equal" ],
      "warnings" : [ ],
      "notes" : [ ]
    }
  },
  {
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP31",
    "name" : "Administrative metadata",
    "location" : "mets/amdSec",
    "description" : "If administrative / preservation metadata is available, it must be described using the administrative metadata section ( <amdSec> ) element. All administrative metadata is present in a single <amdSec> element. It is possible to transfer metadata in a package using just the descriptive metadata section and/or administrative metadata section.",
    "cardinality" : "0..1",
    "level" : "SHOULD",
    "testing" : {
      "outcome" : "FAILED",
      "issues" : [ ],
      "warnings" : [ "There are administrative files not referenced: /metadata/preservation/premis.xml,  and -2 more in Root METS.xml" ],
      "notes" : [ ]
    }
  }
]

Removing ./ results in a successful validation report. However, I believe that the spec should allow this or am I misinterpreting it? For example:

mets/amdSec/digiprovMD/mdRef/@xlink:href
The actual location of the resource. This specification recommends recording a URL type filepath within this attribute.

As a sidenote, the issue description of CSIP66 contains a typo:

As is:

      "issues": [
        "You have files in SIP does not referenced in Root METS.xml"
      ],

To be, as example:

      "issues": [
        "You have files in the SIP that are not referenced in the Root METS.xml"
      ],
milvld commented 1 month ago

Hi @luis100 and @hmiguim , can you confirm that this is indeed a bug in the validator code and not the expected behaviour as per the E-ARK specification? :) That way we can close the issue internally and know whether or not we have to change our samples or not.

luis100 commented 1 month ago

Hello,

On my interpretation, the E-ARK specification allows any relative URL, which would include the ability to use . and .., but the validator does a string comparison between the paths on the AIP (which are listed without any . or ..) and the URLs listed in the METS files, which then results in a failure.

The paths should then be normalized before comparing, but it is currently not done so. This is therefore a bug or lack of support from the validator.

PS: note that allowing .. opens security concerns that must be also be dealt with to avoid path transversal.

hmiguim commented 1 month ago

Fixed in #286