Closed RKrahl closed 3 years ago
Three of us had a brief discussion in yesterday's call, which I like to summarize here (at least the key points):
I think we understand that for machine readability and processability we need controlled lists. This enables machines to know that an alternate identifier for "serial number" needs to be looked up with and only with the string, say, SerialNumber (and not Serial Number, serialNumber, serialNummer, etc.). The controlled list enables validation at the time of metadata registration and thus ensure that we have harmonized metadata. This is important because there are use cases that rely on this harmonization. For instance, a researcher may be on a ship, looking at an instrument that has a serial number engraved in the instrument body. The researcher now wants to retrieve metadata and instrument landing page based on the serial number. With a controlled list, I believe the infrastructure can unambiguously map the serial number onto the PIDINST and thus landing page, and all related metadata.
Controlled lists are on the other hand tricky because it is a strong constraint on accepted metadata and thus could exclude the inclusion of metadata of relevance to some communities but not others. We discussed the possibility of including an option Other (which would carry no real semantics for machines) and perhaps a mechanism whereby in case Other is selected a further attribute "name" (or similar) can be provided. As an example:
<AlternateIdentifier alternateIdentifierType="Other" name="UDID">123.xyz</AlternateIdentifier>
<AlternateIdentifier alternateIdentifierType="SerialNumber">foo/bar</AlternateIdentifier>
I think we also agree that having both a dedicated attribute for serial number and the possibility to include a serial number as an alternate identifier (especially if the latter type isn't controlled) is hardly a desirable design. It doesn't address the heterogeneity of alternate identifier types but also introduces the possibility that some people will use the dedicated property, some will use the alternate identifier mechanism and some might even use both. For infrastructure it will be hard to address this heterogeneity.
It further seems that a controlled list of alternate identifier types does solve the issue of metadata machine readability. Indeed, machines can rely on finding serial numbers by looking at the element with attribute alternateIdentifierType="SerialNumber"
. Of course, there may not be a serial number for some registered instrument but machines can also conclusively infer that if an element AlternateIdentifier
with such an attribute does not exist (i.e. there is no need to handle other cases in order to conclusively infer that there is no serial number provided in the metadata for this instrument.
As such, I tend to agree that with a controlled list the AlternateIdentifier mechanism is sufficient and flexible, and avoids having the possibility to include the same information in multiple manners. However, it needs a controlled list in order to guarantee machine readability as a dedicated attribute does.
Suggested values are thus not sufficiently addressing this issue, since we may suggest "SerialNumber" but someone may still use "Serial Number" and the infrastructure cannot validate this (easily).
We can allow for changes to these controlled lists by maintaining schema versions. Hence, the community can suggest new types which we can include in a future version.
There are now a number of issues here that tackle this issue, at least #5, #15 and #20. Also, Rolf has opened #24 and I am included to move this discussion to #24 and close the other issues.
I have the use case of wanting to add identifiers from the organisation systems like finance or asset systems. I'm also expecting to add identifiers (urls) for QR codes that might be used on an instrument on different missions.
Referring to @markusstocker suggestion above. If alternateIdentifierType
is to become a controlled vocabulary then it would be useful to be able to distinguish between <alternateIdentifierType="Other">
. I think a free-text name
(or alternateIdentifierName
?) would address this adequately for me.
We still do not have finalized the controlled lists of values for several properties. I hereby open individual issues for each of them.
AlternateIdentifier
is another identifier of the same instrument. The subpropertyalternateIdentifierType
should indicate what kind of an identifier this is. Note that at the moment we have free text foralternateIdentifierType
, but we provide a suggested list of values. At the moment, we have:serialNumber
andinventoryNumber
.What other types of alternate identifiers are missing?
Note that there are ongoing related discussions in #5 and #15. We would need to wait at least for a resolution of these issues before deciding on this one.