mets / METS-board

Documents and wiki pages relevant to the business of the METS Editorial Board
14 stars 6 forks source link

Primer Xlink Issue #19

Open thabing opened 7 years ago

thabing commented 7 years ago

Hi all,

Have just discovered that in the primer the xlink:type attribute isn't mentioned together at all with xlink:href. All the other attributes are listed but not that and its mandatory in the used xlink-schema and in the xlink-standard (as I have understood it). Got the comment that since its not mentioned so therefore its thought not to be used and they run into validation problems.

So can we update the text?

Best, Karin

thabing commented 7 years ago

And to add, in the smLinkType where xlink attributes are referenced individually the type attribute is missing.

Best, Karin

leahprescott commented 6 years ago

Leah will make changes to Primer - we will need to ask Glenn to make changes to the overview page

aelkiss commented 5 years ago

The XLink specification itself does not require xlink:type as I understand it - one of the options for conformance for an element is if "it does not have a type attribute from the XLink namespace and it adheres to the conformance constraints imposed by the XLink simple element type, as prescribed in this specification." (https://www.w3.org/TR/xlink11/#markup-reqs).

Additionally, the spec says later on "The value of the type attribute must be supplied unless the element is a simple link (https://www.w3.org/TR/xlink11/#dt-simplelink) and an href attribute in the XLink namespace is supplied. In the latter case, the value "simple" is implied for the type attribute. If a value is supplied for the type attribute, its value must be one of "simple", "extended", "locator", "arc", "resource", "title", or "none"." (https://www.w3.org/TR/xlink11/#link-types)

My understanding was that our xlink schema was what caused xlink:type to be required in this case, but I'm having trouble reproducing the validation error in Oxygen. This example validates fine for me:

<?xml version="1.0"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/terms/"
    xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd 
    http://www.w3.org/1999/xlink http://www.loc.gov/standards/xlink/xlink.xsd">
    <metsHdr />
    <dmdSec ID="dmd001">
        <mdRef LOCTYPE="URN" MIMETYPE="application/xml" MDTYPE="EAD" xlink:href="urn:x-nyu:fales1735" />
    </dmdSec>   
    <structMap>
        <div/>
    </structMap>
</mets>

Could someone provide an example where xlink:type is not provided but xlink:href is, and the document fails validation?

thabing commented 5 years ago

Unable to reproduce validation failure. We need an example

karinbredenberg commented 5 years ago

Its an old thing (2007 and on) One of the problems I find in my mail archive is that it comes when you embed EAD2002 into the METS, the xlink schemas start to collide when they all use the same xlink-schema namespace but points to different schemas with different optional attributes. (A mail to the METS list from 2007) So if you havent moved to EAD3 the error might still occur. And it goes on, so its when you embed something which use another schema of xlink but have the same namespace for xlink. Still as i read the xlink schema in the simpleLink attribute which as I understand it is used in the METS-schema the type is not optional so we should use it.

karinbredenberg commented 5 years ago

Gone through my archive and its still the comment from the 24th of January above that gives the most. Having the same namespace but using different schema versions of xlink causes the problem.

aelkiss commented 5 years ago

The EAD2002 schema https://www.loc.gov/ead/ead.xsd and the METS schema both import http://www.loc.gov/standards/xlink/xlink.xsd. The following minimal example that includes both the EAD and METS schema along with the XLink schema validates for me:

<?xml version="1.0"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:ead="urn:isbn:1-931666-22-9" xmlns:xlink="http://www.w3.org/1999/xlink"
  xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd
  urn:isbn:1-931666-22-9 https://www.loc.gov/ead/ead.xsd
  http://www.w3.org/1999/xlink http://www.loc.gov/standards/xlink/xlink.xsd  
  ">
  <metsHdr/>
  <dmdSec ID="DMD1">
    <mdWrap MDTYPE="EAD">
      <xmlData>
        <ead:ead>
          <ead:eadheader>
            <ead:eadid/>
            <ead:filedesc>
              <ead:titlestmt>
                <ead:titleproper/>
              </ead:titlestmt>
            </ead:filedesc>
          </ead:eadheader>
          <ead:frontmatter/>
          <ead:archdesc level="item">
            <ead:runner/>
            <ead:did>
              <ead:abstract/>
            </ead:did>
          </ead:archdesc>
        </ead:ead>
      </xmlData>
    </mdWrap>
  </dmdSec>
  <dmdSec ID="DMD2">
    <mdRef LOCTYPE="URN" MIMETYPE="application/xml" MDTYPE="EAD" xlink:href="urn:x-nyu:fales1735" />
  </dmdSec>   
  <structMap>
    <div/>
  </structMap>
</mets>
andreasnef commented 5 years ago

I also only have a vague memory of such a problem many years ago. However, two things that I discovered while trying to find the context:

aelkiss commented 5 years ago

For http://www.w3.org/1999/xlink.xsd (and the basically-identical https://www.w3.org/XML/2008/06/xlink.xsd) the issue is that the attribute groups have different names (e.g. simpleAttrs in the W3C schema vs. simpleLink in the LOC schema.)

The particular errors you would get then appear to be dependent on the order the schemas are declared in xsi:schemaLocation. I am not sure whether there is a normative specification for how namespaces with multiply-declared schema should be handled, but at least for Xerces (via JHOVE or Oxygen) the first one loaded seems to win. It would be worth testing with another XML Schema implementation to see if it exhibits the same behavior.

I've outlined some possible paths forward below. I did some testing with a simple example that just imports a different xlink schema in addition to the METS schema, but I do want to do a little bit of testing with some more complex examples to validate these approaches.

1) Document and promote a workaround. If we produce a xlink schema that imports the W3C schema (and therefore includes both the simpleLink and simpleAttrs attribute groups), then we could tell people running into this issue to include that as the first xsi:schemaLocation value to ensure that's the schema that gets used for XLink. I think this would be a good option at least for the short term, since it doesn't have any potential for breakage (for people not already running in to this issue) and doesn't require any coordination with other groups.

2) Longer term, we could work towards updating the METS, PREMIS, and EAD schemas to reference the W3C schema instead of the LOC schema. At least for the METS schema, this seems to be OK if you update the attribute group names. I haven't looked at the PREMIS or EAD schemas to see if it would cause issues there. This would require a lot of coordination and testing with existing files, but should ultimately eliminate the problem.

3) Update the LOC xlink schema to be the workaround from #1. That has the same issue with needing to test existing METS files to ensure nothing breaks, but wouldn't need as much coordination with other schemas that reference the LOC xlink schema, since it would just be an addition to the schema rather than a change.

aelkiss commented 5 years ago

I still can't reproduce the original issue (validation complaint about missing xlink:type with xlink:href) even with the W3C XLink schema - as-is, it just complains that the simpleLink or simpleAttrs groups aren't defined depending on the schema load order, and a simple example validates just fine with a version of the METS schema that references the attribute groups from the W3C XLink schema.

aelkiss commented 5 years ago

The "workaround schema" can't just import the W3C schema for a couple reasons - one, you can't import a schema for the same namespace as the one you're defining a schema for; two, it declares the same attributes as the LOC schema. Still, it looks like a compromise schema that declares the attributes once but declares the attribute groups both from the LOC and W3C schemas is possible.

aelkiss commented 5 years ago

I think I would definitely not recommend option 3 above (to make the LOC XLink schema into this compromise schema) as the compromise schema is definitely a hack. But I think having the compromise schema available (option 1) is a good option. We can talk if option 2 (changing existing schemas to use the W3C XLink schema) is something to work towards longer term or if option 1 is sufficient.

aelkiss commented 5 years ago

I verified with a more complex METS file that if I 1) change the METS schema to reference the W3C XLink schema (and update the referenced attribute groups appropriately) and then 2) validate a METS file that references both this changed METS schema and the existing PREMIS schema that it does not validate, but if I prepend the compromise XLink schema to the schemaLocation element, then it does. I will post the compromise xlink schema and the updated METS schema as a gist.

karinbredenberg commented 5 years ago

EAD had its own xlink schema way back around 2007. When I raised the problem it was coordination made for it to be the same but it took some time.

aelkiss commented 5 years ago

@andreasnef I looked through the SEDA site at https://francearchives.fr/seda/, but I don't see any complete examples of SEDA XML. Do you have an example either just of SEDA or of METS that embeds SEDA?

aelkiss commented 5 years ago

compromise xlink schema: https://gist.github.com/aelkiss/4d3fab39219f6cea23bc0e0e9b0ae3f3 mets schema referencing w3c xlink schema: https://gist.github.com/aelkiss/0f1e69468a92afda86a6098f5613e99d

aelkiss commented 5 years ago

A fourth option would be to remove the XLink schema entirely from the METS schema.

Bertrand mentions that this option has been brought up a few times in the past.

In talking with people from W3C, they thought the XLink schema was dead. It has not been widely adopted; web browsers don't understand XLink. EAD2002 does use XLink. EAD3 does not use XLink at all. The PREMIS2 schema at least imports the LOC XLink schema. It does not appear that PREMIS3 references the XLink schema at all.

Not aware of any users that really take advantage of the XLink attributes. Betrand would be in favor of getting rid of it.

andreasnef commented 5 years ago

@andreasnef I looked through the SEDA site at https://francearchives.fr/seda/, but I don't see any complete examples of SEDA XML. Do you have an example either just of SEDA or of METS that embeds SEDA?

I just verified some of our reference examples for SEDA, but there are none that actually use xlink attributes. I checked with the SEDA (2.1) schemas, and while all of them define the namespace, only a couple actually import the xlink.xsd, and only one of these two actually specifies an (optional) attribute of this namespace.

So, the SEDA example is probably yet another case where it was introduced a while ago and now has the same issues as with METS...

ntra00 commented 5 years ago

On Aaron's option 2 above, "2. Longer term, we could work towards updating the METS, PREMIS, and EAD schemas to reference the W3C schema instead of the LOC schema.". The reason we came up with our own (for MODS and METS is because the W3C site was being hit by validators every time a METS file was accessed.

BertrandCaron commented 5 years ago

I asked Baptiste Nichele (who was until recently working at the Service interministériel des archives de France on the SEDA standard) and he confirmed that the XLINK dependency was introduced in version 2. It was inherited from the standard MEDONA, but he considers that another simple way of referencing would have done the trick as well.